Message ID | 20230625231751.404120-5-bugaevc@gmail.com |
---|---|
State | New |
Headers | show |
Series | [1/5] htl: Let Mach place thread stacks | expand |
Applied, thanks! Sergey Bugaev via Libc-alpha, le lun. 26 juin 2023 02:17:51 +0300, a ecrit: > MAP_FIXED is defined to silently replace any existing mappings at the > address range being mapped over. This, however, is a dangerous, and only > rarely desired behavior. > > Various Unix systems provide replacements or additions to MAP_FIXED: > > * SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space > already contains a mapping in the requested range, Linux returns > EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the > MAP_FIXED_NOREPLACE implementation is intended to be compatible with > Linux. > > * FreeBSD provides the MAP_EXCL flag that has to be used in combination > with MAP_FIXED. It returns EINVAL if the requested range already > contains existing mappings. This is directly analogous to the O_EXCL > flag in the open () call. > > * DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with > different semantics. DragonFly BSD returns ENOMEM if the requested > range already contains existing mappings. NetBSD does not return an > error, but instead creates the mapping at a different address if the > requested range contains mappings. OpenBSD behaves the same, but also > notes that this is the default behavior even without MAP_TRYFIXED > (which is the case on the Hurd too). > > Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary > API to request the behavior of not replacing existing mappings. Declare > MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL), > so any existing software that checks for either of those macros will > pick them up automatically. For compatibility with Linux, return EEXIST > if a mapping already exists. > > Finally, a fun bit of horrifying trivia: until very recently, there has > been a function named try_mmap_fixed () in the Wine project. On Darwin, > it used mach_vm_map () to map the (anonymous) pages without replacing > any existing mappings. On Solaris and NetBSD, it instead paused the > other threads in the process using vfork (), then used mincore () to > probe the pages in the desired address range for being already mapped -- > an error return from mincore () indicates that the page is not mapped at > all. Finally, if no conflicting mappings were found, still in the > vforked child process it performed the mapping with MAP_FIXED, and then > returned from the child back into the parent, resuming the other > threads. > > This relied on: > > * being able to do calls other than execve and _exit from the vforked > child, which is undefined behavior according to POSIX; > > * the parent and the child sharing the address space, and changes made > in the child being visible in the parent; > > * vfork suspending all the threads of the calling process, not just the > calling thread. > > All of this is undefined according to POSIX, but was apparently true on > Solaris, which is the system this function was originally implemented > for. But on NetBSD, where this shim was later ported to, the last bullet > point does not hold: vfork only suspends the calling thread; while the > other threads continue to run. > > Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> > --- > sysdeps/mach/hurd/bits/mman_ext.h | 7 ++++++- > sysdeps/mach/hurd/mmap.c | 26 +++++++++++++++++--------- > 2 files changed, 23 insertions(+), 10 deletions(-) > > diff --git a/sysdeps/mach/hurd/bits/mman_ext.h b/sysdeps/mach/hurd/bits/mman_ext.h > index bbb94743..9658cdd6 100644 > --- a/sysdeps/mach/hurd/bits/mman_ext.h > +++ b/sysdeps/mach/hurd/bits/mman_ext.h > @@ -22,5 +22,10 @@ > > #ifdef __USE_GNU > # define SHM_ANON ((const char *) 1) > -# define MAP_32BIT 0x1000 > + > +# define MAP_32BIT 0x1000 /* Map in the lower 2 GB. */ > +# define MAP_EXCL 0x4000 /* With MAP_FIXED, don't replace existing mappings. */ > + > +# define MAP_TRYFIXED (MAP_FIXED | MAP_EXCL) /* BSD name. */ > +# define MAP_FIXED_NOREPLACE (MAP_FIXED | MAP_EXCL) /* Linux name. */ > #endif /* __USE_GNU */ > diff --git a/sysdeps/mach/hurd/mmap.c b/sysdeps/mach/hurd/mmap.c > index 33672cf6..20264a77 100644 > --- a/sysdeps/mach/hurd/mmap.c > +++ b/sysdeps/mach/hurd/mmap.c > @@ -46,6 +46,9 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset) > if ((mapaddr & (__vm_page_size - 1)) || (offset & (__vm_page_size - 1))) > return (void *) (long int) __hurd_fail (EINVAL); > > + if ((flags & MAP_EXCL) && ! (flags & MAP_FIXED)) > + return (void *) (long int) __hurd_fail (EINVAL); > + > vmprot = VM_PROT_NONE; > if (prot & PROT_READ) > vmprot |= VM_PROT_READ; > @@ -156,15 +159,20 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset) > { > if (err == KERN_NO_SPACE) > { > - /* XXX this is not atomic as it is in unix! */ > - /* The region is already allocated; deallocate it first. */ > - err = __vm_deallocate (__mach_task_self (), mapaddr, len); > - if (! err) > - err = __vm_map (__mach_task_self (), > - &mapaddr, (vm_size_t) len, mask, > - 0, memobj, (vm_offset_t) offset, > - copy, vmprot, max_vmprot, > - copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE); > + if (flags & MAP_EXCL) > + err = EEXIST; > + else > + { > + /* The region is already allocated; deallocate it first. */ > + /* XXX this is not atomic as it is in unix! */ > + err = __vm_deallocate (__mach_task_self (), mapaddr, len); > + if (! err) > + err = __vm_map (__mach_task_self (), > + &mapaddr, (vm_size_t) len, mask, > + 0, memobj, (vm_offset_t) offset, > + copy, vmprot, max_vmprot, > + copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE); > + } > } > } > else > -- > 2.41.0 >
diff --git a/sysdeps/mach/hurd/bits/mman_ext.h b/sysdeps/mach/hurd/bits/mman_ext.h index bbb94743..9658cdd6 100644 --- a/sysdeps/mach/hurd/bits/mman_ext.h +++ b/sysdeps/mach/hurd/bits/mman_ext.h @@ -22,5 +22,10 @@ #ifdef __USE_GNU # define SHM_ANON ((const char *) 1) -# define MAP_32BIT 0x1000 + +# define MAP_32BIT 0x1000 /* Map in the lower 2 GB. */ +# define MAP_EXCL 0x4000 /* With MAP_FIXED, don't replace existing mappings. */ + +# define MAP_TRYFIXED (MAP_FIXED | MAP_EXCL) /* BSD name. */ +# define MAP_FIXED_NOREPLACE (MAP_FIXED | MAP_EXCL) /* Linux name. */ #endif /* __USE_GNU */ diff --git a/sysdeps/mach/hurd/mmap.c b/sysdeps/mach/hurd/mmap.c index 33672cf6..20264a77 100644 --- a/sysdeps/mach/hurd/mmap.c +++ b/sysdeps/mach/hurd/mmap.c @@ -46,6 +46,9 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset) if ((mapaddr & (__vm_page_size - 1)) || (offset & (__vm_page_size - 1))) return (void *) (long int) __hurd_fail (EINVAL); + if ((flags & MAP_EXCL) && ! (flags & MAP_FIXED)) + return (void *) (long int) __hurd_fail (EINVAL); + vmprot = VM_PROT_NONE; if (prot & PROT_READ) vmprot |= VM_PROT_READ; @@ -156,15 +159,20 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset) { if (err == KERN_NO_SPACE) { - /* XXX this is not atomic as it is in unix! */ - /* The region is already allocated; deallocate it first. */ - err = __vm_deallocate (__mach_task_self (), mapaddr, len); - if (! err) - err = __vm_map (__mach_task_self (), - &mapaddr, (vm_size_t) len, mask, - 0, memobj, (vm_offset_t) offset, - copy, vmprot, max_vmprot, - copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE); + if (flags & MAP_EXCL) + err = EEXIST; + else + { + /* The region is already allocated; deallocate it first. */ + /* XXX this is not atomic as it is in unix! */ + err = __vm_deallocate (__mach_task_self (), mapaddr, len); + if (! err) + err = __vm_map (__mach_task_self (), + &mapaddr, (vm_size_t) len, mask, + 0, memobj, (vm_offset_t) offset, + copy, vmprot, max_vmprot, + copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE); + } } } else
MAP_FIXED is defined to silently replace any existing mappings at the address range being mapped over. This, however, is a dangerous, and only rarely desired behavior. Various Unix systems provide replacements or additions to MAP_FIXED: * SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space already contains a mapping in the requested range, Linux returns EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the MAP_FIXED_NOREPLACE implementation is intended to be compatible with Linux. * FreeBSD provides the MAP_EXCL flag that has to be used in combination with MAP_FIXED. It returns EINVAL if the requested range already contains existing mappings. This is directly analogous to the O_EXCL flag in the open () call. * DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with different semantics. DragonFly BSD returns ENOMEM if the requested range already contains existing mappings. NetBSD does not return an error, but instead creates the mapping at a different address if the requested range contains mappings. OpenBSD behaves the same, but also notes that this is the default behavior even without MAP_TRYFIXED (which is the case on the Hurd too). Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary API to request the behavior of not replacing existing mappings. Declare MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL), so any existing software that checks for either of those macros will pick them up automatically. For compatibility with Linux, return EEXIST if a mapping already exists. Finally, a fun bit of horrifying trivia: until very recently, there has been a function named try_mmap_fixed () in the Wine project. On Darwin, it used mach_vm_map () to map the (anonymous) pages without replacing any existing mappings. On Solaris and NetBSD, it instead paused the other threads in the process using vfork (), then used mincore () to probe the pages in the desired address range for being already mapped -- an error return from mincore () indicates that the page is not mapped at all. Finally, if no conflicting mappings were found, still in the vforked child process it performed the mapping with MAP_FIXED, and then returned from the child back into the parent, resuming the other threads. This relied on: * being able to do calls other than execve and _exit from the vforked child, which is undefined behavior according to POSIX; * the parent and the child sharing the address space, and changes made in the child being visible in the parent; * vfork suspending all the threads of the calling process, not just the calling thread. All of this is undefined according to POSIX, but was apparently true on Solaris, which is the system this function was originally implemented for. But on NetBSD, where this shim was later ported to, the last bullet point does not hold: vfork only suspends the calling thread; while the other threads continue to run. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> --- sysdeps/mach/hurd/bits/mman_ext.h | 7 ++++++- sysdeps/mach/hurd/mmap.c | 26 +++++++++++++++++--------- 2 files changed, 23 insertions(+), 10 deletions(-)