diff mbox series

memcg/memcontrol04: Fix the judgment error in test_memcg_low()

Message ID CA+B+MYQPoqtKrnTiVsqyJEvo1_=r3-EJqCQT+RxqfaSG0xXgYw@mail.gmail.com
State Changes Requested
Headers show
Series memcg/memcontrol04: Fix the judgment error in test_memcg_low() | expand

Commit Message

Jin Guojie Nov. 21, 2024, 3:05 a.m. UTC
When running memcontrol04, TFAIL results will appear on various Linux
distributions, kernel versions, and CPUs:

(1) Test platform

* Linux distribution: Ubuntu 24.10
* CPU: X86_64, Arm64
* Kernel: 6.6 longterm
* glibc:  2.40
* LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024

(2) Error logs

During the operation of memcontrol04, file systems such as ext2, ext3,
ext4, xfs, ntfs, and vfat will be tested.
For any of the file system, the same TFAIL result will appear:

root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04

tst_test.c:1823: TINFO: === Testing on ext2 ===
memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0

tst_test.c:1823: TINFO: === Testing on ext3 ===
memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0

......

Summary:
passed   55
failed   5
broken   0
skipped  0
warnings 0

It looks like there is an error in the processing logic of cgroup F.

(3) Cause analysis

In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
it is successful at the end, only E should be judged to have low_events==0,
and the judgment conditions for all other subgroups should be low_events > 0.

(4) Fix issure

#1209
https://github.com/linux-test-project/ltp/issues/1209

Signed-off-by: Jin Guojie <guojie.jin@gmail.com>

---
 testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.45.2

Comments

Andrea Cervesato Nov. 21, 2024, 9:44 a.m. UTC | #1
Hi!

Thanks for checking this test, but did you take a look at the 
explanation given in the test "tags" ? -> 
https://bugzilla.suse.com/show_bug.cgi?id=1196298

Regards,
Andrea

On 11/21/24 04:05, Jin Guojie wrote:
> When running memcontrol04, TFAIL results will appear on various Linux
> distributions, kernel versions, and CPUs:
>
> (1) Test platform
>
> * Linux distribution: Ubuntu 24.10
> * CPU: X86_64, Arm64
> * Kernel: 6.6 longterm
> * glibc:  2.40
> * LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024
>
> (2) Error logs
>
> During the operation of memcontrol04, file systems such as ext2, ext3,
> ext4, xfs, ntfs, and vfat will be tested.
> For any of the file system, the same TFAIL result will appear:
>
> root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04
>
> tst_test.c:1823: TINFO: === Testing on ext2 ===
> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0
>
> tst_test.c:1823: TINFO: === Testing on ext3 ===
> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0
>
> ......
>
> Summary:
> passed   55
> failed   5
> broken   0
> skipped  0
> warnings 0
>
> It looks like there is an error in the processing logic of cgroup F.
>
> (3) Cause analysis
>
> In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
> and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
> it is successful at the end, only E should be judged to have low_events==0,
> and the judgment conditions for all other subgroups should be low_events > 0.
>
> (4) Fix issure
>
> #1209
> https://github.com/linux-test-project/ltp/issues/1209
>
> Signed-off-by: Jin Guojie <guojie.jin@gmail.com>
>
> ---
>   testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
> b/testcases/kernel/controllers/memcg/memcontrol04.c
> index 1b8d115f8..0dddb7449 100644
> --- a/testcases/kernel/controllers/memcg/memcontrol04.c
> +++ b/testcases/kernel/controllers/memcg/memcontrol04.c
> @@ -207,7 +207,7 @@ static void test_memcg_low(void)
>
>                  TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);
>
> -               if (i < E) {
> +               if (i != E) {
>                          TST_EXP_EXPR(low > 0,
>                                       "(%c low events=%ld) > 0", id, low);
>                  } else {
> --
> 2.45.2
>
Jin Guojie Nov. 21, 2024, 1:18 p.m. UTC | #2
When I visit https://bugzilla.suse.com/show_bug.cgi?id=1196298, an
error occurs showing that

"You are not authorized to access bug #1196298."

I tried to register an account by myself, but the error did not change.

So far, I still can't view the link.

Please see if there is a way to give me permission to view this bug.

Or, could you please provide me the explanation of this bug in the email?

On Thu, Nov 21, 2024 at 5:44 PM Andrea Cervesato
<andrea.cervesato@suse.com> wrote:
>
> Hi!
>
> Thanks for checking this test, but did you take a look at the
> explanation given in the test "tags" ? ->
> https://bugzilla.suse.com/show_bug.cgi?id=1196298
>
> Regards,
> Andrea
>
> On 11/21/24 04:05, Jin Guojie wrote:
> > When running memcontrol04, TFAIL results will appear on various Linux
> > distributions, kernel versions, and CPUs:
> >
> > (1) Test platform
> >
> > * Linux distribution: Ubuntu 24.10
> > * CPU: X86_64, Arm64
> > * Kernel: 6.6 longterm
> > * glibc:  2.40
> > * LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024
> >
> > (2) Error logs
> >
> > During the operation of memcontrol04, file systems such as ext2, ext3,
> > ext4, xfs, ntfs, and vfat will be tested.
> > For any of the file system, the same TFAIL result will appear:
> >
> > root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04
> >
> > tst_test.c:1823: TINFO: === Testing on ext2 ===
> > memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> > memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> > memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> > memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> > memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> > memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> > memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> > memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0
> >
> > tst_test.c:1823: TINFO: === Testing on ext3 ===
> > memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> > memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> > memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> > memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> > memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> > memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> > memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> > memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0
> >
> > ......
> >
> > Summary:
> > passed   55
> > failed   5
> > broken   0
> > skipped  0
> > warnings 0
> >
> > It looks like there is an error in the processing logic of cgroup F.
> >
> > (3) Cause analysis
> >
> > In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
> > and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
> > it is successful at the end, only E should be judged to have low_events==0,
> > and the judgment conditions for all other subgroups should be low_events > 0.
> >
> > (4) Fix issure
> >
> > #1209
> > https://github.com/linux-test-project/ltp/issues/1209
> >
> > Signed-off-by: Jin Guojie <guojie.jin@gmail.com>
> >
> > ---
> >   testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
> > b/testcases/kernel/controllers/memcg/memcontrol04.c
> > index 1b8d115f8..0dddb7449 100644
> > --- a/testcases/kernel/controllers/memcg/memcontrol04.c
> > +++ b/testcases/kernel/controllers/memcg/memcontrol04.c
> > @@ -207,7 +207,7 @@ static void test_memcg_low(void)
> >
> >                  TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);
> >
> > -               if (i < E) {
> > +               if (i != E) {
> >                          TST_EXP_EXPR(low > 0,
> >                                       "(%c low events=%ld) > 0", id, low);
> >                  } else {
> > --
> > 2.45.2
> >
Andrea Cervesato Nov. 25, 2024, 9:45 a.m. UTC | #3
Hi Jin,

I can't share the bugzilla content, but the test has been analyzed quite 
a lot and people didn't figure out a proper fix for it.
Linux is highly asynchronous and the memory management is not always 
predictable. Most of the times, testing memory requires a good knowledge 
of the underlying algorithms and, even so, it's not so obvious to write 
tests for it. Sometimes we end up having a test result based on 
**average reply** from the test (i.e. if we obtain TPASS 70 times out of 
100, we can consider test pass).

As far as I see, devs think this test should be rewritten or simplified, 
in order to reduce the noise in LTP, testing only features which can be 
actually tested.

Feel free to take a look if you think it's worth the energy, but I also 
suggest to take a look at the kernel, because it could be there's an 
underlying bug somewhere.

Kind regards,
Andrea Cervesato

On 11/21/24 14:18, Jin Guojie wrote:
> When I visit https://bugzilla.suse.com/show_bug.cgi?id=1196298, an
> error occurs showing that
>
> "You are not authorized to access bug #1196298."
>
> I tried to register an account by myself, but the error did not change.
>
> So far, I still can't view the link.
>
> Please see if there is a way to give me permission to view this bug.
>
> Or, could you please provide me the explanation of this bug in the email?
>
> On Thu, Nov 21, 2024 at 5:44 PM Andrea Cervesato
> <andrea.cervesato@suse.com> wrote:
>> Hi!
>>
>> Thanks for checking this test, but did you take a look at the
>> explanation given in the test "tags" ? ->
>> https://bugzilla.suse.com/show_bug.cgi?id=1196298
>>
>> Regards,
>> Andrea
>>
>> On 11/21/24 04:05, Jin Guojie wrote:
>>> When running memcontrol04, TFAIL results will appear on various Linux
>>> distributions, kernel versions, and CPUs:
>>>
>>> (1) Test platform
>>>
>>> * Linux distribution: Ubuntu 24.10
>>> * CPU: X86_64, Arm64
>>> * Kernel: 6.6 longterm
>>> * glibc:  2.40
>>> * LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024
>>>
>>> (2) Error logs
>>>
>>> During the operation of memcontrol04, file systems such as ext2, ext3,
>>> ext4, xfs, ntfs, and vfat will be tested.
>>> For any of the file system, the same TFAIL result will appear:
>>>
>>> root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04
>>>
>>> tst_test.c:1823: TINFO: === Testing on ext2 ===
>>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
>>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
>>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
>>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
>>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
>>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
>>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
>>> memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0
>>>
>>> tst_test.c:1823: TINFO: === Testing on ext3 ===
>>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
>>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
>>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
>>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
>>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
>>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
>>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
>>> memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0
>>>
>>> ......
>>>
>>> Summary:
>>> passed   55
>>> failed   5
>>> broken   0
>>> skipped  0
>>> warnings 0
>>>
>>> It looks like there is an error in the processing logic of cgroup F.
>>>
>>> (3) Cause analysis
>>>
>>> In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
>>> and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
>>> it is successful at the end, only E should be judged to have low_events==0,
>>> and the judgment conditions for all other subgroups should be low_events > 0.
>>>
>>> (4) Fix issure
>>>
>>> #1209
>>> https://github.com/linux-test-project/ltp/issues/1209
>>>
>>> Signed-off-by: Jin Guojie <guojie.jin@gmail.com>
>>>
>>> ---
>>>    testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
>>> b/testcases/kernel/controllers/memcg/memcontrol04.c
>>> index 1b8d115f8..0dddb7449 100644
>>> --- a/testcases/kernel/controllers/memcg/memcontrol04.c
>>> +++ b/testcases/kernel/controllers/memcg/memcontrol04.c
>>> @@ -207,7 +207,7 @@ static void test_memcg_low(void)
>>>
>>>                   TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);
>>>
>>> -               if (i < E) {
>>> +               if (i != E) {
>>>                           TST_EXP_EXPR(low > 0,
>>>                                        "(%c low events=%ld) > 0", id, low);
>>>                   } else {
>>> --
>>> 2.45.2
>>>
Jin Guojie Nov. 25, 2024, 1:19 p.m. UTC | #4
Hi Andera,

Thank you for the reply.

Here are some additional comments,

1. In the function test_memcg_low(), due to the judgment of if (i ==
E), it is obvious that no memory is allocated for E. For other groups,
50MB memory is allocated.

---- test_memcg_low() ----
for (i = 0; i < ARRAY_SIZE(leaf_cg); i++) { // leaf_cg[] contains C,D,E,F
    leaf_cg[i] = tst_cg_group_mk(trunk_cg[B], "leaf_%c", 'C' + i);

    if (i == E) // no memory allocated in E
        continue;

    alloc_pagecache_in_child(leaf_cg[i], MB(50)); // 50MB allocated for C, D, F
}

Therefore, it's obvious that the change from if (i < E) to if (i != E)
in my patch is consistent with the memory allocation .

2. Since this problem is relatively clear and reproducible in various
distributions, it should be easy to check if someone has discussed any
topics related to my patch.

Could you copy the text in
https://bugzilla.suse.com/show_bug.cgi?id=1196298 and send it to me
separately?

Kind regards,

Jin Guojie

On Mon, Nov 25, 2024 at 5:45 PM Andrea Cervesato
<andrea.cervesato@suse.com> wrote:
>
> Hi Jin,
>
> I can't share the bugzilla content, but the test has been analyzed quite
> a lot and people didn't figure out a proper fix for it.
> Linux is highly asynchronous and the memory management is not always
> predictable. Most of the times, testing memory requires a good knowledge
> of the underlying algorithms and, even so, it's not so obvious to write
> tests for it. Sometimes we end up having a test result based on
> **average reply** from the test (i.e. if we obtain TPASS 70 times out of
> 100, we can consider test pass).
>
> As far as I see, devs think this test should be rewritten or simplified,
> in order to reduce the noise in LTP, testing only features which can be
> actually tested.
>
> Feel free to take a look if you think it's worth the energy, but I also
> suggest to take a look at the kernel, because it could be there's an
> underlying bug somewhere.
>
> Kind regards,
> Andrea Cervesato
>
> On 11/21/24 14:18, Jin Guojie wrote:
> > When I visit https://bugzilla.suse.com/show_bug.cgi?id=1196298, an
> > error occurs showing that
> >
> > "You are not authorized to access bug #1196298."
> >
> > I tried to register an account by myself, but the error did not change.
> >
> > So far, I still can't view the link.
> >
> > Please see if there is a way to give me permission to view this bug.
> >
> > Or, could you please provide me the explanation of this bug in the email?
> >
> > On Thu, Nov 21, 2024 at 5:44 PM Andrea Cervesato
> > <andrea.cervesato@suse.com> wrote:
> >> Hi!
> >>
> >> Thanks for checking this test, but did you take a look at the
> >> explanation given in the test "tags" ? ->
> >> https://bugzilla.suse.com/show_bug.cgi?id=1196298
> >>
> >> Regards,
> >> Andrea
> >>
> >> On 11/21/24 04:05, Jin Guojie wrote:
> >>> When running memcontrol04, TFAIL results will appear on various Linux
> >>> distributions, kernel versions, and CPUs:
> >>>
> >>> (1) Test platform
> >>>
> >>> * Linux distribution: Ubuntu 24.10
> >>> * CPU: X86_64, Arm64
> >>> * Kernel: 6.6 longterm
> >>> * glibc:  2.40
> >>> * LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024
> >>>
> >>> (2) Error logs
> >>>
> >>> During the operation of memcontrol04, file systems such as ext2, ext3,
> >>> ext4, xfs, ntfs, and vfat will be tested.
> >>> For any of the file system, the same TFAIL result will appear:
> >>>
> >>> root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04
> >>>
> >>> tst_test.c:1823: TINFO: === Testing on ext2 ===
> >>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> >>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> >>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> >>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> >>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> >>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> >>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> >>> memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0
> >>>
> >>> tst_test.c:1823: TINFO: === Testing on ext3 ===
> >>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
> >>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
> >>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
> >>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
> >>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
> >>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
> >>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
> >>> memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0
> >>>
> >>> ......
> >>>
> >>> Summary:
> >>> passed   55
> >>> failed   5
> >>> broken   0
> >>> skipped  0
> >>> warnings 0
> >>>
> >>> It looks like there is an error in the processing logic of cgroup F.
> >>>
> >>> (3) Cause analysis
> >>>
> >>> In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
> >>> and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
> >>> it is successful at the end, only E should be judged to have low_events==0,
> >>> and the judgment conditions for all other subgroups should be low_events > 0.
> >>>
> >>> (4) Fix issure
> >>>
> >>> #1209
> >>> https://github.com/linux-test-project/ltp/issues/1209
> >>>
> >>> Signed-off-by: Jin Guojie <guojie.jin@gmail.com>
> >>>
> >>> ---
> >>>    testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
> >>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
> >>> b/testcases/kernel/controllers/memcg/memcontrol04.c
> >>> index 1b8d115f8..0dddb7449 100644
> >>> --- a/testcases/kernel/controllers/memcg/memcontrol04.c
> >>> +++ b/testcases/kernel/controllers/memcg/memcontrol04.c
> >>> @@ -207,7 +207,7 @@ static void test_memcg_low(void)
> >>>
> >>>                   TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);
> >>>
> >>> -               if (i < E) {
> >>> +               if (i != E) {
> >>>                           TST_EXP_EXPR(low > 0,
> >>>                                        "(%c low events=%ld) > 0", id, low);
> >>>                   } else {
> >>> --
> >>> 2.45.2
> >>>
Andrea Cervesato Dec. 2, 2024, 8:45 a.m. UTC | #5
Hi Jin,

Sorry for the late reply but I was sick. If I understand the test right, 
the memory pressure on the leaves should change the amount of memory 
allocated by A/B/F . The expected result is to have 0 memory allocation 
in A/B/F , due to the memory.low boundary, which is the target of our test.

As it's written in the test description, the purpose of the test is to 
verify that memory.low boundary works correctly during memory pressure. 
To sum up the ticket, this normally happens, unless we use 
"memory_recursiveprot" for cgroups. It seems cgroup A/B/F folder is 
unjustifiably protecting its memory.

This is probably a test that verifies an unrealistic and an unpractical 
situation, so the testing model should be changed.

Kind regards,
Andrea Cervesato

On 11/25/24 14:19, Jin Guojie wrote:
> Hi Andera,
>
> Thank you for the reply.
>
> Here are some additional comments,
>
> 1. In the function test_memcg_low(), due to the judgment of if (i ==
> E), it is obvious that no memory is allocated for E. For other groups,
> 50MB memory is allocated.
>
> ---- test_memcg_low() ----
> for (i = 0; i < ARRAY_SIZE(leaf_cg); i++) { // leaf_cg[] contains C,D,E,F
>      leaf_cg[i] = tst_cg_group_mk(trunk_cg[B], "leaf_%c", 'C' + i);
>
>      if (i == E) // no memory allocated in E
>          continue;
>
>      alloc_pagecache_in_child(leaf_cg[i], MB(50)); // 50MB allocated for C, D, F
> }
>
> Therefore, it's obvious that the change from if (i < E) to if (i != E)
> in my patch is consistent with the memory allocation .
>
> 2. Since this problem is relatively clear and reproducible in various
> distributions, it should be easy to check if someone has discussed any
> topics related to my patch.
>
> Could you copy the text in
> https://bugzilla.suse.com/show_bug.cgi?id=1196298 and send it to me
> separately?
>
> Kind regards,
>
> Jin Guojie
>
> On Mon, Nov 25, 2024 at 5:45 PM Andrea Cervesato
> <andrea.cervesato@suse.com> wrote:
>> Hi Jin,
>>
>> I can't share the bugzilla content, but the test has been analyzed quite
>> a lot and people didn't figure out a proper fix for it.
>> Linux is highly asynchronous and the memory management is not always
>> predictable. Most of the times, testing memory requires a good knowledge
>> of the underlying algorithms and, even so, it's not so obvious to write
>> tests for it. Sometimes we end up having a test result based on
>> **average reply** from the test (i.e. if we obtain TPASS 70 times out of
>> 100, we can consider test pass).
>>
>> As far as I see, devs think this test should be rewritten or simplified,
>> in order to reduce the noise in LTP, testing only features which can be
>> actually tested.
>>
>> Feel free to take a look if you think it's worth the energy, but I also
>> suggest to take a look at the kernel, because it could be there's an
>> underlying bug somewhere.
>>
>> Kind regards,
>> Andrea Cervesato
>>
>> On 11/21/24 14:18, Jin Guojie wrote:
>>> When I visithttps://bugzilla.suse.com/show_bug.cgi?id=1196298, an
>>> error occurs showing that
>>>
>>> "You are not authorized to access bug #1196298."
>>>
>>> I tried to register an account by myself, but the error did not change.
>>>
>>> So far, I still can't view the link.
>>>
>>> Please see if there is a way to give me permission to view this bug.
>>>
>>> Or, could you please provide me the explanation of this bug in the email?
>>>
>>> On Thu, Nov 21, 2024 at 5:44 PM Andrea Cervesato
>>> <andrea.cervesato@suse.com> wrote:
>>>> Hi!
>>>>
>>>> Thanks for checking this test, but did you take a look at the
>>>> explanation given in the test "tags" ? ->
>>>> https://bugzilla.suse.com/show_bug.cgi?id=1196298
>>>>
>>>> Regards,
>>>> Andrea
>>>>
>>>> On 11/21/24 04:05, Jin Guojie wrote:
>>>>> When running memcontrol04, TFAIL results will appear on various Linux
>>>>> distributions, kernel versions, and CPUs:
>>>>>
>>>>> (1) Test platform
>>>>>
>>>>> * Linux distribution: Ubuntu 24.10
>>>>> * CPU: X86_64, Arm64
>>>>> * Kernel: 6.6 longterm
>>>>> * glibc:  2.40
>>>>> * LTP version:  commit ec4161186e5, Oct 24 12:18:17 2024
>>>>>
>>>>> (2) Error logs
>>>>>
>>>>> During the operation of memcontrol04, file systems such as ext2, ext3,
>>>>> ext4, xfs, ntfs, and vfat will be tested.
>>>>> For any of the file system, the same TFAIL result will appear:
>>>>>
>>>>> root@vm:~/ltp/testcases/kernel/controllers/memcg# ./memcontrol04
>>>>>
>>>>> tst_test.c:1823: TINFO: === Testing on ext2 ===
>>>>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
>>>>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
>>>>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
>>>>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
>>>>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
>>>>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
>>>>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
>>>>> memcontrol04.c:214: TFAIL: Expect: (F low events=412) == 0
>>>>>
>>>>> tst_test.c:1823: TINFO: === Testing on ext3 ===
>>>>> memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
>>>>> memcontrol04.c:211: TPASS: Expect: (C low events=437) > 0
>>>>> memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
>>>>> memcontrol04.c:211: TPASS: Expect: (D low events=437) > 0
>>>>> memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
>>>>> memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
>>>>> memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
>>>>> memcontrol04.c:214: TFAIL: Expect: (F low events=411) == 0
>>>>>
>>>>> ......
>>>>>
>>>>> Summary:
>>>>> passed   55
>>>>> failed   5
>>>>> broken   0
>>>>> skipped  0
>>>>> warnings 0
>>>>>
>>>>> It looks like there is an error in the processing logic of cgroup F.
>>>>>
>>>>> (3) Cause analysis
>>>>>
>>>>> In the test_memcg_low() function, 4 subgroups (C, D, E, F) are created under B,
>>>>> and 50MB pagecache is allocated in C, D, and F. Therefore, when checking whether
>>>>> it is successful at the end, only E should be judged to have low_events==0,
>>>>> and the judgment conditions for all other subgroups should be low_events > 0.
>>>>>
>>>>> (4) Fix issure
>>>>>
>>>>> #1209
>>>>> https://github.com/linux-test-project/ltp/issues/1209
>>>>>
>>>>> Signed-off-by: Jin Guojie<guojie.jin@gmail.com>
>>>>>
>>>>> ---
>>>>>     testcases/kernel/controllers/memcg/memcontrol04.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
>>>>> b/testcases/kernel/controllers/memcg/memcontrol04.c
>>>>> index 1b8d115f8..0dddb7449 100644
>>>>> --- a/testcases/kernel/controllers/memcg/memcontrol04.c
>>>>> +++ b/testcases/kernel/controllers/memcg/memcontrol04.c
>>>>> @@ -207,7 +207,7 @@ static void test_memcg_low(void)
>>>>>
>>>>>                    TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);
>>>>>
>>>>> -               if (i < E) {
>>>>> +               if (i != E) {
>>>>>                            TST_EXP_EXPR(low > 0,
>>>>>                                         "(%c low events=%ld) > 0", id, low);
>>>>>                    } else {
>>>>> --
>>>>> 2.45.2
>>>>>
Jin Guojie Dec. 4, 2024, 7:36 a.m. UTC | #6
Hi Andrea,

I carefully analyzed the description in memcontrol04.c and the memory
allocation in the code, and found no problems with the case.
In addition, memcontrol04.c comes from test_memcontrol.c in kernel
selftests, and the contents of the two are exactly the same.
By the way, test_memcontrol.c in kselftest also failed the test. The
error is "nonzero memory.events: low in the unprotected A/B/E group",
exactly the same as memcontrol04.c.
It is a reported problem in the kernel mailing lists, but no one has solved it.
(https://lore.kernel.org/all/20230522095233.4246-2-haifeng.xu@shopee.com/T/#mfda771d975e71816ae70d4ac35f957a56cb12bb1)

According to the hint in your previous email, "to take a look at the
kernel, because it could be there's an underlying bug somewhere",
I checked the processing of memory.low in the kernel code and found a
key problem.

When the kernel reclaims memory, it determines the low boundary of a
cgroup not based on the memory.low set by the user, but based on the
"effective-low-boundary".
The calculation of effective-low-boundary is done by the function
effective_protection() in mm/page_counter.c.

The comments of effective_protection() say that,
" a cgroup's effective-low-boundary is derived from its own
memory.low, its parent's and siblings' memory.low, as well as the
actual memory usage in the tree.
If the children aren't claiming (all of) the protection afforded to
them by the parent, distribute the remainder in proportion to the
(unprotected) memory of each cgroup."

The following is a code snippet from effective_protection(),

        if (parent_effective > siblings_protected &&
            parent_usage > siblings_protected &&
            usage > protected) {
                unsigned long unclaimed;
                unclaimed = parent_effective - siblings_protected;
                unclaimed *= usage - protected;
                unclaimed /= parent_usage - siblings_protected;
                ep += unclaimed;
        }

According to the above algorithm, several children will share the
remaining space of the parent in proportion.

For a cgroup with memory.low=0, this code will cause the
effective-low-boundary calculated to be greater than 0.

This leads to a series of unexpected results:
1. When running memcontrol04, even though we specify memory.low of
group F to be 0, but the effective-low-boundary of group F in kernel
is greater than 0.
2. There will always be some allocated memory(about 100KB) in group F
that is not completely released。
3. When reclaim occurs, the "low" value in group F's memory.events
file will increase, resulting in a non-zero result.

In the cgroup documentation,
(https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html)
it is said that the meaning of "low" in the memory.events file is "The
number of times the cgroup is reclaimed due to high memory pressure
even though its usage is under the low boundary."
This means that, if the memory.low value is set to be 0, the "low"
value in memory.events should never be increased.

To fix this problem, the key is that effective-low-boundary should not
be greater than memory.low at any time.

I tried to modify the calculation method of effective-low-boundary in
the kernel, by simply adding a line of judgment.

The following experiments are conducted in the latest kernel (6.13.0-rc1+)

(1) Code change to effective_protection()

diff --git a/mm/page_counter.c b/mm/page_counter.c
index b249d15af9dd..0c86b7885143 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -404,6 +404,7 @@ static unsigned long effective_protection(unsigned
long usage,
                ep += unclaimed;
        }

+       ep = (ep > setting) ? setting : ep;
        return ep;
 }

(2) with the modified kernel, memcontrol04 can pass, and groups F's
memory.events:low remains 0.

memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (C low events=479) > 0
memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0
memcontrol04.c:211: TPASS: Expect: (D low events=479) > 0
memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0
memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0
memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0
memcontrol04.c:214: TPASS: Expect: (F low events=0) == 0

Summary:
passed   60
failed   0
broken   0
skipped  0


Please take a look at the above analysis and modification of the
kernel and see if it is correct.
If it is reasonable, I may submit this issue to the kernel mailing lists.

Jin
diff mbox series

Patch

diff --git a/testcases/kernel/controllers/memcg/memcontrol04.c
b/testcases/kernel/controllers/memcg/memcontrol04.c
index 1b8d115f8..0dddb7449 100644
--- a/testcases/kernel/controllers/memcg/memcontrol04.c
+++ b/testcases/kernel/controllers/memcg/memcontrol04.c
@@ -207,7 +207,7 @@  static void test_memcg_low(void)

                TST_EXP_EXPR(oom == 0, "(%c oom events=%ld) == 0", id, oom);

-               if (i < E) {
+               if (i != E) {
                        TST_EXP_EXPR(low > 0,
                                     "(%c low events=%ld) > 0", id, low);
                } else {