Message ID | 20180417110615.16043-1-liwang@redhat.com |
---|---|
State | Superseded |
Delegated to: | Cyril Hrubis |
Headers | show |
Series | [RFC] mm: correct status code which move_pages() returns for zero page | expand |
On Tue 17-04-18 19:06:15, Li Wang wrote: [...] > diff --git a/mm/migrate.c b/mm/migrate.c > index f65dd69..2b315fc 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > continue; > > err = store_status(status, i, err, 1); > - if (err) > + if (!err) > goto out_flush; This change just doesn't make any sense to me. Why should we bail out if the store_status is successul? I am trying to wrap my head around the test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to explain that move_pages has some semantic issues and the new implementation might be not 100% replacement. Anyway I am studying the test case to come up with a proper fix. > > err = do_move_pages_to_node(mm, &pagelist, current_node); > -- > 2.9.5 >
On Tue 17-04-18 15:03:00, Michal Hocko wrote: > On Tue 17-04-18 19:06:15, Li Wang wrote: > [...] > > diff --git a/mm/migrate.c b/mm/migrate.c > > index f65dd69..2b315fc 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > > continue; > > > > err = store_status(status, i, err, 1); > > - if (err) > > + if (!err) > > goto out_flush; > > This change just doesn't make any sense to me. Why should we bail out if > the store_status is successul? I am trying to wrap my head around the > test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to > explain that move_pages has some semantic issues and the new > implementation might be not 100% replacement. Anyway I am studying the > test case to come up with a proper fix. OK, I get what the test cases does. I've failed to see the subtle difference between alloc_pages_on_node and numa_alloc_onnode. The later doesn't faul in anything. Why are we getting EPERM is quite not yet clear to me. add_page_for_migration uses FOLL_DUMP which should return EFAULT on zero pages (no_page_table()). err = PTR_ERR(page); if (IS_ERR(page)) goto out; therefore bails out from add_page_for_migration and store_status should store that value. There shouldn't be any EPERM on the way. Let me try to reproduce and see what is going on. Btw. which kernel do you try this on?
On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote: > On Tue 17-04-18 15:03:00, Michal Hocko wrote: > > On Tue 17-04-18 19:06:15, Li Wang wrote: > > [...] > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index f65dd69..2b315fc 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, > nodemask_t task_nodes, > > > continue; > > > > > > err = store_status(status, i, err, 1); > > > - if (err) > > > + if (!err) > > > goto out_flush; > > > > This change just doesn't make any sense to me. Why should we bail out if > > the store_status is successul? I am trying to wrap my head around the > > test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to > > explain that move_pages has some semantic issues and the new > > implementation might be not 100% replacement. Anyway I am studying the > > test case to come up with a proper fix. > > OK, I get what the test cases does. I've failed to see the subtle > difference between alloc_pages_on_node and numa_alloc_onnode. The later > doesn't faul in anything. > > Why are we getting EPERM is quite not yet clear to me. > add_page_for_migration uses FOLL_DUMP which should return EFAULT on > zero pages (no_page_table()). > > err = PTR_ERR(page); > if (IS_ERR(page)) > goto out; > > therefore bails out from add_page_for_migration and store_status should > store that value. There shouldn't be any EPERM on the way. > Yes, I print the the return value and confirmed the add_page_for_migration() do right things for zero page. and after store_status(...) the status saves -EFAULT. So I did the change above. > > Let me try to reproduce and see what is going on. Btw. which kernel do > you try this on? > The latest mainline kernel-4.17-rc1.
On Tue 17-04-18 22:28:33, Li Wang wrote: > On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote: > > > On Tue 17-04-18 15:03:00, Michal Hocko wrote: > > > On Tue 17-04-18 19:06:15, Li Wang wrote: > > > [...] > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > > index f65dd69..2b315fc 100644 > > > > --- a/mm/migrate.c > > > > +++ b/mm/migrate.c > > > > @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, > > nodemask_t task_nodes, > > > > continue; > > > > > > > > err = store_status(status, i, err, 1); > > > > - if (err) > > > > + if (!err) > > > > goto out_flush; > > > > > > This change just doesn't make any sense to me. Why should we bail out if > > > the store_status is successul? I am trying to wrap my head around the > > > test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to > > > explain that move_pages has some semantic issues and the new > > > implementation might be not 100% replacement. Anyway I am studying the > > > test case to come up with a proper fix. > > > > OK, I get what the test cases does. I've failed to see the subtle > > difference between alloc_pages_on_node and numa_alloc_onnode. The later > > doesn't faul in anything. > > > > Why are we getting EPERM is quite not yet clear to me. > > add_page_for_migration uses FOLL_DUMP which should return EFAULT on > > zero pages (no_page_table()). > > > > err = PTR_ERR(page); > > if (IS_ERR(page)) > > goto out; > > > > therefore bails out from add_page_for_migration and store_status should > > store that value. There shouldn't be any EPERM on the way. > > > > Yes, I print the the return value and confirmed the > add_page_for_migration() > do right things for zero page. and after store_status(...) the status saves > -EFAULT. > So I did the change above. OK, I guess I knnow what is going on. I must be overwriting the status on the way out by out_flush: /* Make sure we do not overwrite the existing error */ err1 = do_move_pages_to_node(mm, &pagelist, current_node); if (!err1) err1 = store_status(status, start, current_node, i - start); This error handling is rather fragile and I was quite unhappy about it at the time I was developing it. I have to remember all the details why I've done it that way but I would bet my hat this is it. More on this tomorrow.
On 17 Apr 2018, at 15:00, Michal Hocko wrote: > On Tue 17-04-18 22:28:33, Li Wang wrote: >> On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote: >> >>> On Tue 17-04-18 15:03:00, Michal Hocko wrote: >>>> On Tue 17-04-18 19:06:15, Li Wang wrote: >>>> [...] >>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>> index f65dd69..2b315fc 100644 >>>>> --- a/mm/migrate.c >>>>> +++ b/mm/migrate.c >>>>> @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, >>> nodemask_t task_nodes, >>>>> continue; >>>>> >>>>> err = store_status(status, i, err, 1); >>>>> - if (err) >>>>> + if (!err) >>>>> goto out_flush; >>>> >>>> This change just doesn't make any sense to me. Why should we bail out if >>>> the store_status is successul? I am trying to wrap my head around the >>>> test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to >>>> explain that move_pages has some semantic issues and the new >>>> implementation might be not 100% replacement. Anyway I am studying the >>>> test case to come up with a proper fix. >>> >>> OK, I get what the test cases does. I've failed to see the subtle >>> difference between alloc_pages_on_node and numa_alloc_onnode. The later >>> doesn't faul in anything. >>> >>> Why are we getting EPERM is quite not yet clear to me. >>> add_page_for_migration uses FOLL_DUMP which should return EFAULT on >>> zero pages (no_page_table()). >>> >>> err = PTR_ERR(page); >>> if (IS_ERR(page)) >>> goto out; >>> >>> therefore bails out from add_page_for_migration and store_status should >>> store that value. There shouldn't be any EPERM on the way. >>> >> >> Yes, I print the the return value and confirmed the >> add_page_for_migration() >> do right things for zero page. and after store_status(...) the status saves >> -EFAULT. >> So I did the change above. > > OK, I guess I knnow what is going on. I must be overwriting the status > on the way out by > > out_flush: > /* Make sure we do not overwrite the existing error */ > err1 = do_move_pages_to_node(mm, &pagelist, current_node); > if (!err1) > err1 = store_status(status, start, current_node, i - start); > > This error handling is rather fragile and I was quite unhappy about it > at the time I was developing it. I have to remember all the details why > I've done it that way but I would bet my hat this is it. More on this > tomorrow. Hi Michal and Li, The problem is that the variable start is not set properly after store_status(), like the "start = i;" after the first store_status(). The following patch should fix the problem (it has passed all move_pages test cases from ltp on my machine): diff --git a/mm/migrate.c b/mm/migrate.c index f65dd69e1fd1..32afa4723e7f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (err) goto out; } + /* Move to next page (i+1), after we have saved page status (until i) */ + start = i + 1; current_node = NUMA_NO_NODE; } out_flush: Feel free to check it by yourselves. -- Best Regards Yan Zi
On Tue 17-04-18 16:09:33, Zi Yan wrote: > On 17 Apr 2018, at 15:00, Michal Hocko wrote: > > > On Tue 17-04-18 22:28:33, Li Wang wrote: > >> On Tue, Apr 17, 2018 at 10:14 PM, Michal Hocko <mhocko@suse.com> wrote: > >> > >>> On Tue 17-04-18 15:03:00, Michal Hocko wrote: > >>>> On Tue 17-04-18 19:06:15, Li Wang wrote: > >>>> [...] > >>>>> diff --git a/mm/migrate.c b/mm/migrate.c > >>>>> index f65dd69..2b315fc 100644 > >>>>> --- a/mm/migrate.c > >>>>> +++ b/mm/migrate.c > >>>>> @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, > >>> nodemask_t task_nodes, > >>>>> continue; > >>>>> > >>>>> err = store_status(status, i, err, 1); > >>>>> - if (err) > >>>>> + if (!err) > >>>>> goto out_flush; > >>>> > >>>> This change just doesn't make any sense to me. Why should we bail out if > >>>> the store_status is successul? I am trying to wrap my head around the > >>>> test case. 6b9d757ecafc ("mm, numa: rework do_pages_move") tried to > >>>> explain that move_pages has some semantic issues and the new > >>>> implementation might be not 100% replacement. Anyway I am studying the > >>>> test case to come up with a proper fix. > >>> > >>> OK, I get what the test cases does. I've failed to see the subtle > >>> difference between alloc_pages_on_node and numa_alloc_onnode. The later > >>> doesn't faul in anything. > >>> > >>> Why are we getting EPERM is quite not yet clear to me. > >>> add_page_for_migration uses FOLL_DUMP which should return EFAULT on > >>> zero pages (no_page_table()). > >>> > >>> err = PTR_ERR(page); > >>> if (IS_ERR(page)) > >>> goto out; > >>> > >>> therefore bails out from add_page_for_migration and store_status should > >>> store that value. There shouldn't be any EPERM on the way. > >>> > >> > >> Yes, I print the the return value and confirmed the > >> add_page_for_migration() > >> do right things for zero page. and after store_status(...) the status saves > >> -EFAULT. > >> So I did the change above. > > > > OK, I guess I knnow what is going on. I must be overwriting the status > > on the way out by > > > > out_flush: > > /* Make sure we do not overwrite the existing error */ > > err1 = do_move_pages_to_node(mm, &pagelist, current_node); > > if (!err1) > > err1 = store_status(status, start, current_node, i - start); > > > > This error handling is rather fragile and I was quite unhappy about it > > at the time I was developing it. I have to remember all the details why > > I've done it that way but I would bet my hat this is it. More on this > > tomorrow. > > Hi Michal and Li, > > The problem is that the variable start is not set properly after store_status(), > like the "start = i;" after the first store_status(). > > The following patch should fix the problem (it has passed all move_pages test cases from ltp > on my machine): > > diff --git a/mm/migrate.c b/mm/migrate.c > index f65dd69e1fd1..32afa4723e7f 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > if (err) > goto out; > } > + /* Move to next page (i+1), after we have saved page status (until i) */ > + start = i + 1; > current_node = NUMA_NO_NODE; > } > out_flush: > > Feel free to check it by yourselves. Yes, you are right. I never update start if the last page in the range fails and so we overwrite the whole [start, i] range. I wish the code wasn't that ugly and subtle but considering how we can fail in different ways and that we want to batch as much as possible I do not see an easy way. Care to send the patch? I would just drop the comment. Thanks!
On Wed 18-04-18 11:07:22, Michal Hocko wrote: > On Tue 17-04-18 16:09:33, Zi Yan wrote: [...] > > diff --git a/mm/migrate.c b/mm/migrate.c > > index f65dd69e1fd1..32afa4723e7f 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > > if (err) > > goto out; > > } > > + /* Move to next page (i+1), after we have saved page status (until i) */ > > + start = i + 1; > > current_node = NUMA_NO_NODE; > > } > > out_flush: > > > > Feel free to check it by yourselves. > > Yes, you are right. I never update start if the last page in the range > fails and so we overwrite the whole [start, i] range. I wish the code > wasn't that ugly and subtle but considering how we can fail in different > ways and that we want to batch as much as possible I do not see an easy > way. > > Care to send the patch? I would just drop the comment. Hmm, thinking about it some more. An alternative would be to check for list_empty on the page list. It is a bit larger diff but maybe that would be tiny bit cleaner because there is simply no point to call do_move_pages_to_node on an empty list in the first place. --- diff --git a/mm/migrate.c b/mm/migrate.c index 507cf9ba21bf..46f93b9ba724 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1634,12 +1634,14 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, current_node = NUMA_NO_NODE; } out_flush: - /* Make sure we do not overwrite the existing error */ - err1 = do_move_pages_to_node(mm, &pagelist, current_node); - if (!err1) - err1 = store_status(status, start, current_node, i - start); - if (!err) - err = err1; + if (!list_empty(&pagelist)) { + /* Make sure we do not overwrite the existing error */ + err1 = do_move_pages_to_node(mm, &pagelist, current_node); + if (!err1) + err1 = store_status(status, start, current_node, i - start); + if (!err) + err = err1; + } out: return err; }
On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote: > On Wed 18-04-18 11:07:22, Michal Hocko wrote: > > On Tue 17-04-18 16:09:33, Zi Yan wrote: > [...] > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index f65dd69e1fd1..32afa4723e7f 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, > nodemask_t task_nodes, > > > if (err) > > > goto out; > > > } > > > + /* Move to next page (i+1), after we have saved page > status (until i) */ > > > + start = i + 1; > > > current_node = NUMA_NO_NODE; > > > } > > > out_flush: > > > > > > Feel free to check it by yourselves. > > > > Yes, you are right. I never update start if the last page in the range > > fails and so we overwrite the whole [start, i] range. I wish the code > > wasn't that ugly and subtle but considering how we can fail in different > > ways and that we want to batch as much as possible I do not see an easy > > way. > > > > Care to send the patch? I would just drop the comment. > > Hmm, thinking about it some more. An alternative would be to check for > list_empty on the page list. It is a bit larger diff but maybe that > would be tiny bit cleaner because there is simply no point to call > do_move_pages_to_node on an empty list in the first place. > Hi Michal, Zi I tried your patch separately, both of them works fine to me.
On Wed 18-04-18 18:39:19, Li Wang wrote: > On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote: > > > On Wed 18-04-18 11:07:22, Michal Hocko wrote: > > > On Tue 17-04-18 16:09:33, Zi Yan wrote: > > [...] > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > > index f65dd69e1fd1..32afa4723e7f 100644 > > > > --- a/mm/migrate.c > > > > +++ b/mm/migrate.c > > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm, > > nodemask_t task_nodes, > > > > if (err) > > > > goto out; > > > > } > > > > + /* Move to next page (i+1), after we have saved page > > status (until i) */ > > > > + start = i + 1; > > > > current_node = NUMA_NO_NODE; > > > > } > > > > out_flush: > > > > > > > > Feel free to check it by yourselves. > > > > > > Yes, you are right. I never update start if the last page in the range > > > fails and so we overwrite the whole [start, i] range. I wish the code > > > wasn't that ugly and subtle but considering how we can fail in different > > > ways and that we want to batch as much as possible I do not see an easy > > > way. > > > > > > Care to send the patch? I would just drop the comment. > > > > Hmm, thinking about it some more. An alternative would be to check for > > list_empty on the page list. It is a bit larger diff but maybe that > > would be tiny bit cleaner because there is simply no point to call > > do_move_pages_to_node on an empty list in the first place. > > > > Hi Michal, Zi > > I tried your patch separately, both of them works fine to me. Thanks for retesting! Do you plan to post a patch with the changelog or should I do it?
On Wed, Apr 18, 2018, 19:29 Michal Hocko <mhocko@suse.com> wrote: > On Wed 18-04-18 18:39:19, Li Wang wrote: > > On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <mhocko@suse.com> wrote: > > > > > On Wed 18-04-18 11:07:22, Michal Hocko wrote: > > > > On Tue 17-04-18 16:09:33, Zi Yan wrote: > > > [...] > > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > > > index f65dd69e1fd1..32afa4723e7f 100644 > > > > > --- a/mm/migrate.c > > > > > +++ b/mm/migrate.c > > > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct > *mm, > > > nodemask_t task_nodes, > > > > > if (err) > > > > > goto out; > > > > > } > > > > > + /* Move to next page (i+1), after we have saved > page > > > status (until i) */ > > > > > + start = i + 1; > > > > > current_node = NUMA_NO_NODE; > > > > > } > > > > > out_flush: > > > > > > > > > > Feel free to check it by yourselves. > > > > > > > > Yes, you are right. I never update start if the last page in the > range > > > > fails and so we overwrite the whole [start, i] range. I wish the code > > > > wasn't that ugly and subtle but considering how we can fail in > different > > > > ways and that we want to batch as much as possible I do not see an > easy > > > > way. > > > > > > > > Care to send the patch? I would just drop the comment. > > > > > > Hmm, thinking about it some more. An alternative would be to check for > > > list_empty on the page list. It is a bit larger diff but maybe that > > > would be tiny bit cleaner because there is simply no point to call > > > do_move_pages_to_node on an empty list in the first place. > > > > > > > Hi Michal, Zi > > > > I tried your patch separately, both of them works fine to me. > > Thanks for retesting! Do you plan to post a patch with the changelog or > should I do it? > You better. Li Wang <div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Apr 18, 2018, 19:29 Michal Hocko <<a href="mailto:mhocko@suse.com" target="_blank" rel="noreferrer">mhocko@suse.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Wed 18-04-18 18:39:19, Li Wang wrote:<br> > On Wed, Apr 18, 2018 at 5:19 PM, Michal Hocko <<a href="mailto:mhocko@suse.com" rel="noreferrer noreferrer" target="_blank">mhocko@suse.com</a>> wrote:<br> > <br> > > On Wed 18-04-18 11:07:22, Michal Hocko wrote:<br> > > > On Tue 17-04-18 16:09:33, Zi Yan wrote:<br> > > [...]<br> > > > > diff --git a/mm/migrate.c b/mm/migrate.c<br> > > > > index f65dd69e1fd1..32afa4723e7f 100644<br> > > > > --- a/mm/migrate.c<br> > > > > +++ b/mm/migrate.c<br> > > > > @@ -1619,6 +1619,8 @@ static int do_pages_move(struct mm_struct *mm,<br> > > nodemask_t task_nodes,<br> > > > > if (err)<br> > > > > goto out;<br> > > > > }<br> > > > > + /* Move to next page (i+1), after we have saved page<br> > > status (until i) */<br> > > > > + start = i + 1;<br> > > > > current_node = NUMA_NO_NODE;<br> > > > > }<br> > > > > out_flush:<br> > > > ><br> > > > > Feel free to check it by yourselves.<br> > > ><br> > > > Yes, you are right. I never update start if the last page in the range<br> > > > fails and so we overwrite the whole [start, i] range. I wish the code<br> > > > wasn't that ugly and subtle but considering how we can fail in different<br> > > > ways and that we want to batch as much as possible I do not see an easy<br> > > > way.<br> > > ><br> > > > Care to send the patch? I would just drop the comment.<br> > ><br> > > Hmm, thinking about it some more. An alternative would be to check for<br> > > list_empty on the page list. It is a bit larger diff but maybe that<br> > > would be tiny bit cleaner because there is simply no point to call<br> > > do_move_pages_to_node on an empty list in the first place.<br> > ><br> > <br> > Hi Michal, Zi<br> > <br> > I tried your patch separately, both of them works fine to me.<br> <br> Thanks for retesting! Do you plan to post a patch with the changelog or<br> should I do it?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">You better.</div><div dir="auto"><br></div><div dir="auto">Li Wang</div></div>
diff --git a/mm/migrate.c b/mm/migrate.c index f65dd69..2b315fc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1608,7 +1608,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, continue; err = store_status(status, i, err, 1); - if (err) + if (!err) goto out_flush; err = do_move_pages_to_node(mm, &pagelist, current_node);
move_pages(2) declears that status code for zero page is supposed to be -EFAULT. But now it (LTP/move_pages04 test) gets -EPERM, the root cause is that not goto out_flush after store_status() saves the err which add_page_for_migration() returns for zero page. LTP move_pages04: TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT Signed-off-by: Li Wang <liwang@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)