diff mbox series

[5/5] hurd: Implement MAP_EXCL

Message ID 20230625231751.404120-5-bugaevc@gmail.com
State New
Headers show
Series [1/5] htl: Let Mach place thread stacks | expand

Commit Message

Sergey Bugaev June 25, 2023, 11:17 p.m. UTC
MAP_FIXED is defined to silently replace any existing mappings at the
address range being mapped over. This, however, is a dangerous, and only
rarely desired behavior.

Various Unix systems provide replacements or additions to MAP_FIXED:

* SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space
  already contains a mapping in the requested range, Linux returns
  EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the
  MAP_FIXED_NOREPLACE implementation is intended to be compatible with
  Linux.

* FreeBSD provides the MAP_EXCL flag that has to be used in combination
  with MAP_FIXED. It returns EINVAL if the requested range already
  contains existing mappings. This is directly analogous to the O_EXCL
  flag in the open () call.

* DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with
  different semantics. DragonFly BSD returns ENOMEM if the requested
  range already contains existing mappings. NetBSD does not return an
  error, but instead creates the mapping at a different address if the
  requested range contains mappings. OpenBSD behaves the same, but also
  notes that this is the default behavior even without MAP_TRYFIXED
  (which is the case on the Hurd too).

Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary
API to request the behavior of not replacing existing mappings. Declare
MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL),
so any existing software that checks for either of those macros will
pick them up automatically. For compatibility with Linux, return EEXIST
if a mapping already exists.

Finally, a fun bit of horrifying trivia: until very recently, there has
been a function named try_mmap_fixed () in the Wine project. On Darwin,
it used mach_vm_map () to map the (anonymous) pages without replacing
any existing mappings. On Solaris and NetBSD, it instead paused the
other threads in the process using vfork (), then used mincore () to
probe the pages in the desired address range for being already mapped --
an error return from mincore () indicates that the page is not mapped at
all. Finally, if no conflicting mappings were found, still in the
vforked child process it performed the mapping with MAP_FIXED, and then
returned from the child back into the parent, resuming the other
threads.

This relied on:

* being able to do calls other than execve and _exit from the vforked
  child, which is undefined behavior according to POSIX;

* the parent and the child sharing the address space, and changes made
  in the child being visible in the parent;

* vfork suspending all the threads of the calling process, not just the
  calling thread.

All of this is undefined according to POSIX, but was apparently true on
Solaris, which is the system this function was originally implemented
for. But on NetBSD, where this shim was later ported to, the last bullet
point does not hold: vfork only suspends the calling thread; while the
other threads continue to run.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
---
 sysdeps/mach/hurd/bits/mman_ext.h |  7 ++++++-
 sysdeps/mach/hurd/mmap.c          | 26 +++++++++++++++++---------
 2 files changed, 23 insertions(+), 10 deletions(-)

Comments

Samuel Thibault July 2, 2023, 11:37 p.m. UTC | #1
Applied, thanks!

Sergey Bugaev via Libc-alpha, le lun. 26 juin 2023 02:17:51 +0300, a ecrit:
> MAP_FIXED is defined to silently replace any existing mappings at the
> address range being mapped over. This, however, is a dangerous, and only
> rarely desired behavior.
> 
> Various Unix systems provide replacements or additions to MAP_FIXED:
> 
> * SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space
>   already contains a mapping in the requested range, Linux returns
>   EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the
>   MAP_FIXED_NOREPLACE implementation is intended to be compatible with
>   Linux.
> 
> * FreeBSD provides the MAP_EXCL flag that has to be used in combination
>   with MAP_FIXED. It returns EINVAL if the requested range already
>   contains existing mappings. This is directly analogous to the O_EXCL
>   flag in the open () call.
> 
> * DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with
>   different semantics. DragonFly BSD returns ENOMEM if the requested
>   range already contains existing mappings. NetBSD does not return an
>   error, but instead creates the mapping at a different address if the
>   requested range contains mappings. OpenBSD behaves the same, but also
>   notes that this is the default behavior even without MAP_TRYFIXED
>   (which is the case on the Hurd too).
> 
> Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary
> API to request the behavior of not replacing existing mappings. Declare
> MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL),
> so any existing software that checks for either of those macros will
> pick them up automatically. For compatibility with Linux, return EEXIST
> if a mapping already exists.
> 
> Finally, a fun bit of horrifying trivia: until very recently, there has
> been a function named try_mmap_fixed () in the Wine project. On Darwin,
> it used mach_vm_map () to map the (anonymous) pages without replacing
> any existing mappings. On Solaris and NetBSD, it instead paused the
> other threads in the process using vfork (), then used mincore () to
> probe the pages in the desired address range for being already mapped --
> an error return from mincore () indicates that the page is not mapped at
> all. Finally, if no conflicting mappings were found, still in the
> vforked child process it performed the mapping with MAP_FIXED, and then
> returned from the child back into the parent, resuming the other
> threads.
> 
> This relied on:
> 
> * being able to do calls other than execve and _exit from the vforked
>   child, which is undefined behavior according to POSIX;
> 
> * the parent and the child sharing the address space, and changes made
>   in the child being visible in the parent;
> 
> * vfork suspending all the threads of the calling process, not just the
>   calling thread.
> 
> All of this is undefined according to POSIX, but was apparently true on
> Solaris, which is the system this function was originally implemented
> for. But on NetBSD, where this shim was later ported to, the last bullet
> point does not hold: vfork only suspends the calling thread; while the
> other threads continue to run.
> 
> Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
> ---
>  sysdeps/mach/hurd/bits/mman_ext.h |  7 ++++++-
>  sysdeps/mach/hurd/mmap.c          | 26 +++++++++++++++++---------
>  2 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/sysdeps/mach/hurd/bits/mman_ext.h b/sysdeps/mach/hurd/bits/mman_ext.h
> index bbb94743..9658cdd6 100644
> --- a/sysdeps/mach/hurd/bits/mman_ext.h
> +++ b/sysdeps/mach/hurd/bits/mman_ext.h
> @@ -22,5 +22,10 @@
>  
>  #ifdef __USE_GNU
>  # define SHM_ANON	((const char *) 1)
> -# define MAP_32BIT	0x1000
> +
> +# define MAP_32BIT	0x1000	/* Map in the lower 2 GB.  */
> +# define MAP_EXCL	0x4000	/* With MAP_FIXED, don't replace existing mappings.  */
> +
> +# define MAP_TRYFIXED		(MAP_FIXED | MAP_EXCL)	/* BSD name.  */
> +# define MAP_FIXED_NOREPLACE	(MAP_FIXED | MAP_EXCL)	/* Linux name.  */
>  #endif /* __USE_GNU  */
> diff --git a/sysdeps/mach/hurd/mmap.c b/sysdeps/mach/hurd/mmap.c
> index 33672cf6..20264a77 100644
> --- a/sysdeps/mach/hurd/mmap.c
> +++ b/sysdeps/mach/hurd/mmap.c
> @@ -46,6 +46,9 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset)
>    if ((mapaddr & (__vm_page_size - 1)) || (offset & (__vm_page_size - 1)))
>      return (void *) (long int) __hurd_fail (EINVAL);
>  
> +  if ((flags & MAP_EXCL) && ! (flags & MAP_FIXED))
> +    return (void *) (long int) __hurd_fail (EINVAL);
> +
>    vmprot = VM_PROT_NONE;
>    if (prot & PROT_READ)
>      vmprot |= VM_PROT_READ;
> @@ -156,15 +159,20 @@ __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset)
>      {
>        if (err == KERN_NO_SPACE)
>  	{
> -	  /* XXX this is not atomic as it is in unix! */
> -	  /* The region is already allocated; deallocate it first.  */
> -	  err = __vm_deallocate (__mach_task_self (), mapaddr, len);
> -	  if (! err)
> -	    err = __vm_map (__mach_task_self (),
> -			    &mapaddr, (vm_size_t) len, mask,
> -			    0, memobj, (vm_offset_t) offset,
> -			    copy, vmprot, max_vmprot,
> -			    copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE);
> +	  if (flags & MAP_EXCL)
> +	    err = EEXIST;
> +	  else
> +	    {
> +	      /* The region is already allocated; deallocate it first.  */
> +	      /* XXX this is not atomic as it is in unix! */
> +	      err = __vm_deallocate (__mach_task_self (), mapaddr, len);
> +	      if (! err)
> +		err = __vm_map (__mach_task_self (),
> +				&mapaddr, (vm_size_t) len, mask,
> +				0, memobj, (vm_offset_t) offset,
> +				copy, vmprot, max_vmprot,
> +				copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE);
> +	    }
>  	}
>      }
>    else
> -- 
> 2.41.0
>
diff mbox series

Patch

diff --git a/sysdeps/mach/hurd/bits/mman_ext.h b/sysdeps/mach/hurd/bits/mman_ext.h
index bbb94743..9658cdd6 100644
--- a/sysdeps/mach/hurd/bits/mman_ext.h
+++ b/sysdeps/mach/hurd/bits/mman_ext.h
@@ -22,5 +22,10 @@ 
 
 #ifdef __USE_GNU
 # define SHM_ANON	((const char *) 1)
-# define MAP_32BIT	0x1000
+
+# define MAP_32BIT	0x1000	/* Map in the lower 2 GB.  */
+# define MAP_EXCL	0x4000	/* With MAP_FIXED, don't replace existing mappings.  */
+
+# define MAP_TRYFIXED		(MAP_FIXED | MAP_EXCL)	/* BSD name.  */
+# define MAP_FIXED_NOREPLACE	(MAP_FIXED | MAP_EXCL)	/* Linux name.  */
 #endif /* __USE_GNU  */
diff --git a/sysdeps/mach/hurd/mmap.c b/sysdeps/mach/hurd/mmap.c
index 33672cf6..20264a77 100644
--- a/sysdeps/mach/hurd/mmap.c
+++ b/sysdeps/mach/hurd/mmap.c
@@ -46,6 +46,9 @@  __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset)
   if ((mapaddr & (__vm_page_size - 1)) || (offset & (__vm_page_size - 1)))
     return (void *) (long int) __hurd_fail (EINVAL);
 
+  if ((flags & MAP_EXCL) && ! (flags & MAP_FIXED))
+    return (void *) (long int) __hurd_fail (EINVAL);
+
   vmprot = VM_PROT_NONE;
   if (prot & PROT_READ)
     vmprot |= VM_PROT_READ;
@@ -156,15 +159,20 @@  __mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset)
     {
       if (err == KERN_NO_SPACE)
 	{
-	  /* XXX this is not atomic as it is in unix! */
-	  /* The region is already allocated; deallocate it first.  */
-	  err = __vm_deallocate (__mach_task_self (), mapaddr, len);
-	  if (! err)
-	    err = __vm_map (__mach_task_self (),
-			    &mapaddr, (vm_size_t) len, mask,
-			    0, memobj, (vm_offset_t) offset,
-			    copy, vmprot, max_vmprot,
-			    copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE);
+	  if (flags & MAP_EXCL)
+	    err = EEXIST;
+	  else
+	    {
+	      /* The region is already allocated; deallocate it first.  */
+	      /* XXX this is not atomic as it is in unix! */
+	      err = __vm_deallocate (__mach_task_self (), mapaddr, len);
+	      if (! err)
+		err = __vm_map (__mach_task_self (),
+				&mapaddr, (vm_size_t) len, mask,
+				0, memobj, (vm_offset_t) offset,
+				copy, vmprot, max_vmprot,
+				copy ? VM_INHERIT_COPY : VM_INHERIT_SHARE);
+	    }
 	}
     }
   else