mbox series

[0/9] Add support for SVM atomics in Nouveau

Message ID 20210209010722.13839-1-apopple@nvidia.com
Headers show
Series Add support for SVM atomics in Nouveau | expand

Message

Alistair Popple Feb. 9, 2021, 1:07 a.m. UTC
This series adds support to Nouveau for atomic memory operations on OpenCL
shared virtual memory (SVM). This is achieved using the atomic PTE bits on
the GPU to only permit atomic operations to system memory when a page is
not mapped in userspace on the CPU.

This is implemented by adding a mode to migrate_vma_pages() which unmaps
and isolates existing pages from the CPU and pins them. The original
userspace page table entries are migrated to point to device private pages
allocated by the driver. This allows the driver to enable GPU atomic access
to the page as it will receive a callback when CPU userspace needs to
access it.

In response to this callback the driver revokes the atomic access
permission from the GPU and migrates entries to point back to the original
page. The original page is unpinned as part of the migration operation
which also returns it to the LRU.

Patch 3 contains the bulk of the memory management changes to implement
unmap and pin.

Patches 6-9 extend Nouveau to use the new mode to allow system wide atomics
for OpenCL SVM to be implemented on Nouveau.

This has been tested using the latest upstream Mesa userspace with a simple
OpenCL test program which checks the results of atomic GPU operations on a
buffer whilst also writing to the same buffer from the CPU.

Problems yet to be addressed:

Recent changes to pin_user_pages() prevent the creation of pinned pages in
ZONE_MOVABLE. This series allows pinned pages to be created in ZONE_MOVABLE
as attempts to migrate may fail which would be fatal to userspace.

In this case migration of the pinned page is unnecessary as the page can be
unpinned at anytime by having the driver revoke atomic permission as it
does for the migrate_to_ram() callback. However a method of calling this
when memory needs to be moved has yet to be resolved so any discussion is
welcome.

Alistair Popple (9):
  mm/migrate.c: Always allow device private pages to migrate
  mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup()
  mm/migrate: Add a unmap and pin migration mode
  Documentation: Add unmap and pin to HMM
  hmm-tests: Add test for unmap and pin
  nouveau/dmem: Only map migrating pages
  nouveau/svm: Refactor nouveau_range_fault
  nouveau/dmem: Add support for multiple page types
  nouveau/svm: Implement atomic SVM access

 Documentation/vm/hmm.rst                      |  22 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c            |   4 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_dmem.c        | 190 +++++++++++++++---
 drivers/gpu/drm/nouveau/nouveau_dmem.h        |   9 +
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 148 +++++++++++---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
 include/linux/migrate.h                       |   2 +
 include/linux/migrate_mode.h                  |   1 +
 lib/test_hmm.c                                | 109 ++++++++--
 lib/test_hmm_uapi.h                           |   1 +
 mm/migrate.c                                  |  82 +++++---
 tools/testing/selftests/vm/hmm-tests.c        |  49 +++++
 14 files changed, 524 insertions(+), 101 deletions(-)

Comments

Daniel Vetter Feb. 9, 2021, 10:27 a.m. UTC | #1
On Tue, Feb 09, 2021 at 12:07:13PM +1100, Alistair Popple wrote:
> This series adds support to Nouveau for atomic memory operations on OpenCL
> shared virtual memory (SVM). This is achieved using the atomic PTE bits on
> the GPU to only permit atomic operations to system memory when a page is
> not mapped in userspace on the CPU.
>
> This is implemented by adding a mode to migrate_vma_pages() which unmaps
> and isolates existing pages from the CPU and pins them. The original
> userspace page table entries are migrated to point to device private pages
> allocated by the driver. This allows the driver to enable GPU atomic access
> to the page as it will receive a callback when CPU userspace needs to
> access it.
>
> In response to this callback the driver revokes the atomic access
> permission from the GPU and migrates entries to point back to the original
> page. The original page is unpinned as part of the migration operation
> which also returns it to the LRU.
>
> Patch 3 contains the bulk of the memory management changes to implement
> unmap and pin.
>
> Patches 6-9 extend Nouveau to use the new mode to allow system wide atomics
> for OpenCL SVM to be implemented on Nouveau.
>
> This has been tested using the latest upstream Mesa userspace with a simple
> OpenCL test program which checks the results of atomic GPU operations on a
> buffer whilst also writing to the same buffer from the CPU.
>
> Problems yet to be addressed:
>
> Recent changes to pin_user_pages() prevent the creation of pinned pages in
> ZONE_MOVABLE. This series allows pinned pages to be created in ZONE_MOVABLE
> as attempts to migrate may fail which would be fatal to userspace.
>
> In this case migration of the pinned page is unnecessary as the page can be
> unpinned at anytime by having the driver revoke atomic permission as it
> does for the migrate_to_ram() callback. However a method of calling this
> when memory needs to be moved has yet to be resolved so any discussion is
> welcome.

Why do we need to pin for gpu atomics? You still have the callback for
cpu faults, so you
can move the page as needed, and hence a long-term pin sounds like the
wrong approach.

That would avoid all the hacking around long term pin constraints, because
for real unmoveable long term pinned memory we really want to have all
these checks. So I think we might be missing some other callbacks to be
able to move these pages, instead of abusing longterm pins for lack of
better tools.

Cheers, Daniel



>
> Alistair Popple (9):
>   mm/migrate.c: Always allow device private pages to migrate
>   mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup()
>   mm/migrate: Add a unmap and pin migration mode
>   Documentation: Add unmap and pin to HMM
>   hmm-tests: Add test for unmap and pin
>   nouveau/dmem: Only map migrating pages
>   nouveau/svm: Refactor nouveau_range_fault
>   nouveau/dmem: Add support for multiple page types
>   nouveau/svm: Implement atomic SVM access
>
>  Documentation/vm/hmm.rst                      |  22 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c            |   4 +-
>  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
>  drivers/gpu/drm/nouveau/nouveau_dmem.c        | 190 +++++++++++++++---
>  drivers/gpu/drm/nouveau/nouveau_dmem.h        |   9 +
>  drivers/gpu/drm/nouveau/nouveau_svm.c         | 148 +++++++++++---
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
>  include/linux/migrate.h                       |   2 +
>  include/linux/migrate_mode.h                  |   1 +
>  lib/test_hmm.c                                | 109 ++++++++--
>  lib/test_hmm_uapi.h                           |   1 +
>  mm/migrate.c                                  |  82 +++++---
>  tools/testing/selftests/vm/hmm-tests.c        |  49 +++++
>  14 files changed, 524 insertions(+), 101 deletions(-)
>
> --
> 2.20.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Alistair Popple Feb. 9, 2021, 12:57 p.m. UTC | #2
On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> >
> > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > ZONE_MOVABLE. This series allows pinned pages to be created in 
ZONE_MOVABLE
> > as attempts to migrate may fail which would be fatal to userspace.
> >
> > In this case migration of the pinned page is unnecessary as the page can 
be
> > unpinned at anytime by having the driver revoke atomic permission as it
> > does for the migrate_to_ram() callback. However a method of calling this
> > when memory needs to be moved has yet to be resolved so any discussion is
> > welcome.
> 
> Why do we need to pin for gpu atomics? You still have the callback for
> cpu faults, so you
> can move the page as needed, and hence a long-term pin sounds like the
> wrong approach.

Technically a real long term unmoveable pin isn't required, because as you say 
the page can be moved as needed at any time. However I needed some way of 
stopping the CPU page from being freed once the userspace mappings for it had 
been removed. Obviously I could have just used get_page() but from the 
perspective of page migration the result is much the same as a pin - a page 
which can't be moved because of the extra refcount.

The normal solution of registering an MMU notifier to unpin the page when it 
needs to be moved also doesn't work as the CPU page tables now point to the 
device-private page and hence the migration code won't call any invalidate 
notifiers for the CPU page.

> That would avoid all the hacking around long term pin constraints, because
> for real unmoveable long term pinned memory we really want to have all
> these checks. So I think we might be missing some other callbacks to be
> able to move these pages, instead of abusing longterm pins for lack of
> better tools.

Yes, I would like to avoid the long term pin constraints as well if possible I 
just haven't found a solution yet. Are you suggesting it might be possible to 
add a callback in the page migration logic to specially deal with moving these 
pages?

Thanks, Alistair

> Cheers, Daniel
> 
> 
> 
> >
> > Alistair Popple (9):
> >   mm/migrate.c: Always allow device private pages to migrate
> >   mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup()
> >   mm/migrate: Add a unmap and pin migration mode
> >   Documentation: Add unmap and pin to HMM
> >   hmm-tests: Add test for unmap and pin
> >   nouveau/dmem: Only map migrating pages
> >   nouveau/svm: Refactor nouveau_range_fault
> >   nouveau/dmem: Add support for multiple page types
> >   nouveau/svm: Implement atomic SVM access
> >
> >  Documentation/vm/hmm.rst                      |  22 +-
> >  arch/powerpc/kvm/book3s_hv_uvmem.c            |   4 +-
> >  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
> >  drivers/gpu/drm/nouveau/nouveau_dmem.c        | 190 +++++++++++++++---
> >  drivers/gpu/drm/nouveau/nouveau_dmem.h        |   9 +
> >  drivers/gpu/drm/nouveau/nouveau_svm.c         | 148 +++++++++++---
> >  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
> >  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
> >  include/linux/migrate.h                       |   2 +
> >  include/linux/migrate_mode.h                  |   1 +
> >  lib/test_hmm.c                                | 109 ++++++++--
> >  lib/test_hmm_uapi.h                           |   1 +
> >  mm/migrate.c                                  |  82 +++++---
> >  tools/testing/selftests/vm/hmm-tests.c        |  49 +++++
> >  14 files changed, 524 insertions(+), 101 deletions(-)
> >
> > --
> > 2.20.1
> >
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Jason Gunthorpe Feb. 9, 2021, 1:35 p.m. UTC | #3
On Tue, Feb 09, 2021 at 11:57:28PM +1100, Alistair Popple wrote:
> On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > >
> > > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > > ZONE_MOVABLE. This series allows pinned pages to be created in 
> ZONE_MOVABLE
> > > as attempts to migrate may fail which would be fatal to userspace.
> > >
> > > In this case migration of the pinned page is unnecessary as the page can 
> be
> > > unpinned at anytime by having the driver revoke atomic permission as it
> > > does for the migrate_to_ram() callback. However a method of calling this
> > > when memory needs to be moved has yet to be resolved so any discussion is
> > > welcome.
> > 
> > Why do we need to pin for gpu atomics? You still have the callback for
> > cpu faults, so you
> > can move the page as needed, and hence a long-term pin sounds like the
> > wrong approach.
> 
> Technically a real long term unmoveable pin isn't required, because as you say 
> the page can be moved as needed at any time. However I needed some way of 
> stopping the CPU page from being freed once the userspace mappings for it had 
> been removed. 

The issue is you took the page out of the PTE it belongs to, which
makes it orphaned and unlocatable by the rest of the mm?

Ideally this would leave the PTE in place so everything continues to
work, just disable CPU access to it.

Maybe some kind of special swap entry?

I also don't much like the use of ZONE_DEVICE here, that should only
be used for actual device memory, not as a temporary proxy for CPU
pages.. Having two struct pages refer to the same physical memory is
pretty ugly.

> The normal solution of registering an MMU notifier to unpin the page when it 
> needs to be moved also doesn't work as the CPU page tables now point to the
> device-private page and hence the migration code won't call any invalidate 
> notifiers for the CPU page.

The fact the page is lost from the MM seems to be the main issue here.

> Yes, I would like to avoid the long term pin constraints as well if possible I 
> just haven't found a solution yet. Are you suggesting it might be possible to 
> add a callback in the page migration logic to specially deal with moving these 
> pages?

How would migration even find the page?

Jason
Daniel Vetter Feb. 9, 2021, 1:37 p.m. UTC | #4
On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple <apopple@nvidia.com> wrote:
>
> On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > >
> > > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > > ZONE_MOVABLE. This series allows pinned pages to be created in
> ZONE_MOVABLE
> > > as attempts to migrate may fail which would be fatal to userspace.
> > >
> > > In this case migration of the pinned page is unnecessary as the page can
> be
> > > unpinned at anytime by having the driver revoke atomic permission as it
> > > does for the migrate_to_ram() callback. However a method of calling this
> > > when memory needs to be moved has yet to be resolved so any discussion is
> > > welcome.
> >
> > Why do we need to pin for gpu atomics? You still have the callback for
> > cpu faults, so you
> > can move the page as needed, and hence a long-term pin sounds like the
> > wrong approach.
>
> Technically a real long term unmoveable pin isn't required, because as you say
> the page can be moved as needed at any time. However I needed some way of
> stopping the CPU page from being freed once the userspace mappings for it had
> been removed. Obviously I could have just used get_page() but from the
> perspective of page migration the result is much the same as a pin - a page
> which can't be moved because of the extra refcount.

long term pin vs short term page reference aren't fully fleshed out.
But the rule more or less is:
- short term page reference: _must_ get released in finite time for
migration and other things, either because you have a callback, or
because it's just for direct I/O, which will complete. This means
short term pins will delay migration, but not foul it complete

- long term pin: the page cannot be moved, all migration must fail.
Also this will have an impact on COW behaviour for fork (but not sure
where those patches are, John Hubbard will know).

So I think for your use case here you want a) short term page
reference to make sure it doesn't disappear plus b) callback to make
sure migrate isn't blocked.

Breaking ZONE_MOVEABLE with either allowing long term pins or failing
migrations because you don't release your short term page reference
isn't good.

> The normal solution of registering an MMU notifier to unpin the page when it
> needs to be moved also doesn't work as the CPU page tables now point to the
> device-private page and hence the migration code won't call any invalidate
> notifiers for the CPU page.

Yeah you need some other callback for migration on the page directly.
it's a bit awkward since there is one already for struct
address_space, but that's own by the address_space/page cache, not
HMM. So I think we need something else, maybe something for each
ZONE_DEVICE?

> > That would avoid all the hacking around long term pin constraints, because
> > for real unmoveable long term pinned memory we really want to have all
> > these checks. So I think we might be missing some other callbacks to be
> > able to move these pages, instead of abusing longterm pins for lack of
> > better tools.
>
> Yes, I would like to avoid the long term pin constraints as well if possible I
> just haven't found a solution yet. Are you suggesting it might be possible to
> add a callback in the page migration logic to specially deal with moving these
> pages?

s/possible/need to type some code to address it/ I think.

But also I'm not much of an expert on this, I've only just started
learning how this all fits together coming from the gpu side. There's
a _lot_ of finesse involved.

Cheers, Daniel

>
> Thanks, Alistair
>
> > Cheers, Daniel
> >
> >
> >
> > >
> > > Alistair Popple (9):
> > >   mm/migrate.c: Always allow device private pages to migrate
> > >   mm/migrate.c: Allow pfn flags to be passed to migrate_vma_setup()
> > >   mm/migrate: Add a unmap and pin migration mode
> > >   Documentation: Add unmap and pin to HMM
> > >   hmm-tests: Add test for unmap and pin
> > >   nouveau/dmem: Only map migrating pages
> > >   nouveau/svm: Refactor nouveau_range_fault
> > >   nouveau/dmem: Add support for multiple page types
> > >   nouveau/svm: Implement atomic SVM access
> > >
> > >  Documentation/vm/hmm.rst                      |  22 +-
> > >  arch/powerpc/kvm/book3s_hv_uvmem.c            |   4 +-
> > >  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
> > >  drivers/gpu/drm/nouveau/nouveau_dmem.c        | 190 +++++++++++++++---
> > >  drivers/gpu/drm/nouveau/nouveau_dmem.h        |   9 +
> > >  drivers/gpu/drm/nouveau/nouveau_svm.c         | 148 +++++++++++---
> > >  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
> > >  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
> > >  include/linux/migrate.h                       |   2 +
> > >  include/linux/migrate_mode.h                  |   1 +
> > >  lib/test_hmm.c                                | 109 ++++++++--
> > >  lib/test_hmm_uapi.h                           |   1 +
> > >  mm/migrate.c                                  |  82 +++++---
> > >  tools/testing/selftests/vm/hmm-tests.c        |  49 +++++
> > >  14 files changed, 524 insertions(+), 101 deletions(-)
> > >
> > > --
> > > 2.20.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>
>
>
Daniel Vetter Feb. 9, 2021, 1:39 p.m. UTC | #5
On Tue, Feb 9, 2021 at 2:35 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Feb 09, 2021 at 11:57:28PM +1100, Alistair Popple wrote:
> > On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > > >
> > > > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > > > ZONE_MOVABLE. This series allows pinned pages to be created in
> > ZONE_MOVABLE
> > > > as attempts to migrate may fail which would be fatal to userspace.
> > > >
> > > > In this case migration of the pinned page is unnecessary as the page can
> > be
> > > > unpinned at anytime by having the driver revoke atomic permission as it
> > > > does for the migrate_to_ram() callback. However a method of calling this
> > > > when memory needs to be moved has yet to be resolved so any discussion is
> > > > welcome.
> > >
> > > Why do we need to pin for gpu atomics? You still have the callback for
> > > cpu faults, so you
> > > can move the page as needed, and hence a long-term pin sounds like the
> > > wrong approach.
> >
> > Technically a real long term unmoveable pin isn't required, because as you say
> > the page can be moved as needed at any time. However I needed some way of
> > stopping the CPU page from being freed once the userspace mappings for it had
> > been removed.
>
> The issue is you took the page out of the PTE it belongs to, which
> makes it orphaned and unlocatable by the rest of the mm?
>
> Ideally this would leave the PTE in place so everything continues to
> work, just disable CPU access to it.
>
> Maybe some kind of special swap entry?

I probably should have read the patches more in detail, I was assuming
the ZONE_DEVICE is only for vram. At least I thought the requirement
for gpu atomics was that the page is in vram, but maybe I'm mixing up
how this works on nvidia with how it works in other places. Iirc we
had a long discussion about this at lpc19 that ended with the
conclusion that we must be able to migrate, and sometimes migration is
blocked. But the details ellude me now.

Either way ZONE_DEVICE for not vram/device memory sounds wrong. Is
that really going on here?
-Daniel

>
> I also don't much like the use of ZONE_DEVICE here, that should only
> be used for actual device memory, not as a temporary proxy for CPU
> pages.. Having two struct pages refer to the same physical memory is
> pretty ugly.
>
> > The normal solution of registering an MMU notifier to unpin the page when it
> > needs to be moved also doesn't work as the CPU page tables now point to the
> > device-private page and hence the migration code won't call any invalidate
> > notifiers for the CPU page.
>
> The fact the page is lost from the MM seems to be the main issue here.
>
> > Yes, I would like to avoid the long term pin constraints as well if possible I
> > just haven't found a solution yet. Are you suggesting it might be possible to
> > add a callback in the page migration logic to specially deal with moving these
> > pages?
>
> How would migration even find the page?
>
> Jason
Jason Gunthorpe Feb. 9, 2021, 1:44 p.m. UTC | #6
On Tue, Feb 09, 2021 at 02:39:51PM +0100, Daniel Vetter wrote:

> Either way ZONE_DEVICE for not vram/device memory sounds wrong. Is
> that really going on here?

My read was this was doing non-coherent atomics on CPU memory.

Atomics on GPU memory is just called migration to GPU memory, it
doesn't need to be special for atomics. In that case it can free the
CPU struct page completely as the data now lives in the ZONE_DEVICE
page so no need for a pin, no problem with movable

Jason
John Hubbard Feb. 9, 2021, 8:53 p.m. UTC | #7
On 2/9/21 5:37 AM, Daniel Vetter wrote:
> On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple <apopple@nvidia.com> wrote:
>>
>> On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
>>>>
>>>> Recent changes to pin_user_pages() prevent the creation of pinned pages in
>>>> ZONE_MOVABLE. This series allows pinned pages to be created in
>> ZONE_MOVABLE
>>>> as attempts to migrate may fail which would be fatal to userspace.
>>>>
>>>> In this case migration of the pinned page is unnecessary as the page can
>> be
>>>> unpinned at anytime by having the driver revoke atomic permission as it
>>>> does for the migrate_to_ram() callback. However a method of calling this
>>>> when memory needs to be moved has yet to be resolved so any discussion is
>>>> welcome.
>>>
>>> Why do we need to pin for gpu atomics? You still have the callback for
>>> cpu faults, so you
>>> can move the page as needed, and hence a long-term pin sounds like the
>>> wrong approach.
>>
>> Technically a real long term unmoveable pin isn't required, because as you say
>> the page can be moved as needed at any time. However I needed some way of
>> stopping the CPU page from being freed once the userspace mappings for it had
>> been removed. Obviously I could have just used get_page() but from the
>> perspective of page migration the result is much the same as a pin - a page
>> which can't be moved because of the extra refcount.
> 
> long term pin vs short term page reference aren't fully fleshed out.
> But the rule more or less is:
> - short term page reference: _must_ get released in finite time for
> migration and other things, either because you have a callback, or
> because it's just for direct I/O, which will complete. This means
> short term pins will delay migration, but not foul it complete


GPU atomic operations to sysmem are hard to categorize, because because application
programmers could easily write programs that do a long series of atomic operations.
Such a program would be a little weird, but it's hard to rule out.


> 
> - long term pin: the page cannot be moved, all migration must fail.
> Also this will have an impact on COW behaviour for fork (but not sure
> where those patches are, John Hubbard will know).


That would be Jason's commit 57efa1fe59576 ("mm/gup: prevent gup_fast from racing
with COW during fork"), which is in linux-next 20201216.


> 
> So I think for your use case here you want a) short term page
> reference to make sure it doesn't disappear plus b) callback to make
> sure migrate isn't blocked.
> 
> Breaking ZONE_MOVEABLE with either allowing long term pins or failing
> migrations because you don't release your short term page reference
> isn't good.
> 
>> The normal solution of registering an MMU notifier to unpin the page when it
>> needs to be moved also doesn't work as the CPU page tables now point to the
>> device-private page and hence the migration code won't call any invalidate
>> notifiers for the CPU page.
> 
> Yeah you need some other callback for migration on the page directly.
> it's a bit awkward since there is one already for struct
> address_space, but that's own by the address_space/page cache, not
> HMM. So I think we need something else, maybe something for each
> ZONE_DEVICE?
> 

This direction sounds at least...possible. Using MMU notifiers instead of pins
is definitely appealing. I'm not quite clear on the callback idea above, but
overall it seems like taking advantage of the ZONE_DEVICE tracking of pages
(without having to put anything additional in each struct page), could work.

Additional notes or ideas here are definitely welcome.



thanks,
Jerome Glisse Feb. 9, 2021, 9:17 p.m. UTC | #8
On Tue, Feb 09, 2021 at 09:35:20AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 09, 2021 at 11:57:28PM +1100, Alistair Popple wrote:
> > On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > > >
> > > > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > > > ZONE_MOVABLE. This series allows pinned pages to be created in 
> > ZONE_MOVABLE
> > > > as attempts to migrate may fail which would be fatal to userspace.
> > > >
> > > > In this case migration of the pinned page is unnecessary as the page can 
> > be
> > > > unpinned at anytime by having the driver revoke atomic permission as it
> > > > does for the migrate_to_ram() callback. However a method of calling this
> > > > when memory needs to be moved has yet to be resolved so any discussion is
> > > > welcome.
> > > 
> > > Why do we need to pin for gpu atomics? You still have the callback for
> > > cpu faults, so you
> > > can move the page as needed, and hence a long-term pin sounds like the
> > > wrong approach.
> > 
> > Technically a real long term unmoveable pin isn't required, because as you say 
> > the page can be moved as needed at any time. However I needed some way of 
> > stopping the CPU page from being freed once the userspace mappings for it had 
> > been removed. 
> 
> The issue is you took the page out of the PTE it belongs to, which
> makes it orphaned and unlocatable by the rest of the mm?
> 
> Ideally this would leave the PTE in place so everything continues to
> work, just disable CPU access to it.
> 
> Maybe some kind of special swap entry?
> 
> I also don't much like the use of ZONE_DEVICE here, that should only
> be used for actual device memory, not as a temporary proxy for CPU
> pages.. Having two struct pages refer to the same physical memory is
> pretty ugly.
> 
> > The normal solution of registering an MMU notifier to unpin the page when it 
> > needs to be moved also doesn't work as the CPU page tables now point to the
> > device-private page and hence the migration code won't call any invalidate 
> > notifiers for the CPU page.
> 
> The fact the page is lost from the MM seems to be the main issue here.
> 
> > Yes, I would like to avoid the long term pin constraints as well if possible I 
> > just haven't found a solution yet. Are you suggesting it might be possible to 
> > add a callback in the page migration logic to specially deal with moving these 
> > pages?
> 
> How would migration even find the page?

Migration can scan memory from physical address (isolate_migratepages_range())
So the CPU mapping is not the only path to get to a page.

Cheers,
Jérôme
Daniel Vetter Feb. 10, 2021, 12:59 p.m. UTC | #9
On Tue, Feb 09, 2021 at 12:53:27PM -0800, John Hubbard wrote:
> On 2/9/21 5:37 AM, Daniel Vetter wrote:
> > On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple <apopple@nvidia.com> wrote:
> > > 
> > > On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
> > > > > 
> > > > > Recent changes to pin_user_pages() prevent the creation of pinned pages in
> > > > > ZONE_MOVABLE. This series allows pinned pages to be created in
> > > ZONE_MOVABLE
> > > > > as attempts to migrate may fail which would be fatal to userspace.
> > > > > 
> > > > > In this case migration of the pinned page is unnecessary as the page can
> > > be
> > > > > unpinned at anytime by having the driver revoke atomic permission as it
> > > > > does for the migrate_to_ram() callback. However a method of calling this
> > > > > when memory needs to be moved has yet to be resolved so any discussion is
> > > > > welcome.
> > > > 
> > > > Why do we need to pin for gpu atomics? You still have the callback for
> > > > cpu faults, so you
> > > > can move the page as needed, and hence a long-term pin sounds like the
> > > > wrong approach.
> > > 
> > > Technically a real long term unmoveable pin isn't required, because as you say
> > > the page can be moved as needed at any time. However I needed some way of
> > > stopping the CPU page from being freed once the userspace mappings for it had
> > > been removed. Obviously I could have just used get_page() but from the
> > > perspective of page migration the result is much the same as a pin - a page
> > > which can't be moved because of the extra refcount.
> > 
> > long term pin vs short term page reference aren't fully fleshed out.
> > But the rule more or less is:
> > - short term page reference: _must_ get released in finite time for
> > migration and other things, either because you have a callback, or
> > because it's just for direct I/O, which will complete. This means
> > short term pins will delay migration, but not foul it complete
> 
> 
> GPU atomic operations to sysmem are hard to categorize, because because application
> programmers could easily write programs that do a long series of atomic operations.
> Such a program would be a little weird, but it's hard to rule out.

Yeah, but we can forcefully break this whenever we feel like by revoking
the page, moving it, and then reinstating the gpu pte again and let it
continue.

If that's no possible then what we need here instead is an mlock() type of
thing I think.

> > - long term pin: the page cannot be moved, all migration must fail.
> > Also this will have an impact on COW behaviour for fork (but not sure
> > where those patches are, John Hubbard will know).
> 
> 
> That would be Jason's commit 57efa1fe59576 ("mm/gup: prevent gup_fast from racing
> with COW during fork"), which is in linux-next 20201216.

Nice, thanks for the pointer.
> 
> 
> > 
> > So I think for your use case here you want a) short term page
> > reference to make sure it doesn't disappear plus b) callback to make
> > sure migrate isn't blocked.
> > 
> > Breaking ZONE_MOVEABLE with either allowing long term pins or failing
> > migrations because you don't release your short term page reference
> > isn't good.
> > 
> > > The normal solution of registering an MMU notifier to unpin the page when it
> > > needs to be moved also doesn't work as the CPU page tables now point to the
> > > device-private page and hence the migration code won't call any invalidate
> > > notifiers for the CPU page.
> > 
> > Yeah you need some other callback for migration on the page directly.
> > it's a bit awkward since there is one already for struct
> > address_space, but that's own by the address_space/page cache, not
> > HMM. So I think we need something else, maybe something for each
> > ZONE_DEVICE?
> > 
> 
> This direction sounds at least...possible. Using MMU notifiers instead of pins
> is definitely appealing. I'm not quite clear on the callback idea above, but
> overall it seems like taking advantage of the ZONE_DEVICE tracking of pages
> (without having to put anything additional in each struct page), could work.
> 
> Additional notes or ideas here are definitely welcome.

Well I don't have ideas for the details tbh, just from the little I
learned about how this all fits together pretending to be a pin while
pretending to not be a pin just gets us back to the mess we're trying to
solve with gup vs pup cleanup. And given that the pin_user_pages rollout
hasn't even completed yet it's maybe way to early to already toss it out
again.

I think overall we should try really hard to not mix up things between
memory that's pinned and memory that can be moved. Because retroactively
trying to fix things up just because it was easier to get a feature going
that way is very, very hard. And I think we've demonstrated countless
times that "ah we just fix this with pinning, it doesn't happen often, no
one will notice" always comes back to bite us later on :-)

Like just looking ahead, 1GB pages are a thing, we will have to support
that, migrate_page is the only way to get there, and statistically just a
1/256th chance of encountering a pinned page guarantees we'll never
assemble a giant page.

Cheers, Daniel
Jason Gunthorpe Feb. 10, 2021, 5:56 p.m. UTC | #10
On Tue, Feb 09, 2021 at 04:17:38PM -0500, Jerome Glisse wrote:

> > > Yes, I would like to avoid the long term pin constraints as well if possible I
> > > just haven't found a solution yet. Are you suggesting it might be possible to 
> > > add a callback in the page migration logic to specially deal with moving these 
> > > pages?
> > 
> > How would migration even find the page?
> 
> Migration can scan memory from physical address (isolate_migratepages_range())
> So the CPU mapping is not the only path to get to a page.

I mean find out that the page is now owned by the GPU driver to tell
it that it needs to migrate that reference. Normally that would go
through the VMA to the mmu notifier, but with the page removed from
the normal VMA it can't get to a mmu notifier or the GPU driver.

Jason
Jason Gunthorpe Feb. 10, 2021, 5:59 p.m. UTC | #11
On Tue, Feb 09, 2021 at 12:53:27PM -0800, John Hubbard wrote:

> This direction sounds at least...possible. Using MMU notifiers instead of pins
> is definitely appealing. I'm not quite clear on the callback idea above, but
> overall it seems like taking advantage of the ZONE_DEVICE tracking of pages
> (without having to put anything additional in each struct page), could work.

It isn't the ZONE_DEVICE page that needs to be tracked.

Really what you want to do here is leave the CPU page in the VMA and
the page tables where it started and deny CPU access to the page. Then
all the proper machinery will continue to work.

IMHO "migration" is the wrong idea if the data isn't actually moving.

Jason
John Hubbard Feb. 11, 2021, 2:26 a.m. UTC | #12
On 2/10/21 4:59 AM, Daniel Vetter wrote:
...
>> GPU atomic operations to sysmem are hard to categorize, because because application
>> programmers could easily write programs that do a long series of atomic operations.
>> Such a program would be a little weird, but it's hard to rule out.
> 
> Yeah, but we can forcefully break this whenever we feel like by revoking
> the page, moving it, and then reinstating the gpu pte again and let it
> continue.

Oh yes, that's true.

> 
> If that's no possible then what we need here instead is an mlock() type of
> thing I think.
No need for that, then.


thanks,
Christoph Hellwig Feb. 11, 2021, 7:55 a.m. UTC | #13
On Wed, Feb 10, 2021 at 01:59:13PM -0400, Jason Gunthorpe wrote:
> Really what you want to do here is leave the CPU page in the VMA and
> the page tables where it started and deny CPU access to the page. Then
> all the proper machinery will continue to work.
> 
> IMHO "migration" is the wrong idea if the data isn't actually moving.

Agreed.
Alistair Popple Feb. 17, 2021, 11 p.m. UTC | #14
On Thursday, 11 February 2021 6:55:10 PM AEDT Christoph Hellwig wrote:
> On Wed, Feb 10, 2021 at 01:59:13PM -0400, Jason Gunthorpe wrote:
> > Really what you want to do here is leave the CPU page in the VMA and
> > the page tables where it started and deny CPU access to the page. Then
> > all the proper machinery will continue to work.
> > 
> > IMHO "migration" is the wrong idea if the data isn't actually moving.
> 
> Agreed.
 
I chose "migration" because device private pages seemed like a good way of 
reusing existing code to do what was required (a callback on CPU access).

However I have been reworking this to use mmu notifiers as the callback and it 
seems to simplify some things nicely so think I also agree. It removes the 
requirement for the pin as well which is nice, I'll post it as a v2 soon.

 - Alistair