mbox series

[RFC,v2,00/13] nommu UML

Message ID cover.1731290567.git.thehajime@gmail.com
Headers show
Series nommu UML | expand

Message

Hajime Tazaki Nov. 11, 2024, 6:27 a.m. UTC
This is a series of patches of nommu arch addition to UML.  It would
be nice to ask comments/opinions on this.

There are still several limitations/issues which we already found;
here is the list of those issues.

- prompt configured with /etc/profile is broken (variables are not
  expanded, ${HOSTNAME%%.*}:$PWD#)
- there are no mechanism implemented to cache for mapped memory of
  exec(2) thus, always read files from filesystem upon every exec,
  which makes slow on some benchmark (lmbench).

-- Hajime


RFC v2:
- base branch is now uml/linux.git instead of torvalds/linux.git.
- reorganize the patch series to clean up
- fixed various coding styles issues
- clean up exec code path [07/13]
- fixed the crash/SIGSEGV case on userspace programs [10/13]
- add seccomp filter to limit syscall caller address [06/13]
- detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
- removes unrelated changes
- removes unneeded ifndef CONFIG_MMU
- convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
- proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
  https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/

RFC:
- https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/

Hajime Tazaki (13):
  fs: binfmt_elf_efpic: add architecture hook elf_arch_finalize_exec
  x86/um: nommu: elf loader for fdpic
  um: nommu: memory handling
  x86/um: nommu: syscall handling
  x86/um: nommu: syscall translation by zpoline
  um: nommu: prevent host syscalls from userspace by seccomp filter
  x86/um: nommu: process/thread handling
  um: nommu: configure fs register on host syscall invocation
  x86/um/vdso: nommu: vdso memory update
  x86/um: nommu: signal handling
  um: change machine name for uname output
  um: nommu: add documentation of nommu UML
  um: nommu: plug nommu code into build system

 Documentation/virt/uml/nommu-uml.rst    | 221 +++++++++++++++++++++++
 arch/um/Kconfig                         |  14 +-
 arch/um/Makefile                        |   6 +
 arch/um/configs/x86_64_nommu_defconfig  |  64 +++++++
 arch/um/include/asm/Kbuild              |   1 +
 arch/um/include/asm/futex.h             |   4 +
 arch/um/include/asm/mmu.h               |   8 +
 arch/um/include/asm/mmu_context.h       |  13 +-
 arch/um/include/asm/ptrace-generic.h    |   6 +
 arch/um/include/asm/tlbflush.h          |  22 +++
 arch/um/include/asm/uaccess.h           |   7 +-
 arch/um/include/shared/kern_util.h      |   3 +
 arch/um/include/shared/os.h             |  14 ++
 arch/um/kernel/Makefile                 |   3 +-
 arch/um/kernel/mem.c                    |  12 +-
 arch/um/kernel/physmem.c                |   6 +
 arch/um/kernel/process.c                |  33 +++-
 arch/um/kernel/skas/Makefile            |   4 +-
 arch/um/kernel/trap.c                   |  14 ++
 arch/um/kernel/um_arch.c                |   4 +
 arch/um/os-Linux/Makefile               |   5 +-
 arch/um/os-Linux/cpu.c                  |  50 ++++++
 arch/um/os-Linux/internal.h             |   5 +
 arch/um/os-Linux/main.c                 |   5 +
 arch/um/os-Linux/process.c              |  94 +++++++++-
 arch/um/os-Linux/signal.c               |  18 +-
 arch/um/os-Linux/skas/process.c         |   4 +
 arch/um/os-Linux/start_up.c             |   3 +
 arch/um/os-Linux/util.c                 |   3 +-
 arch/x86/um/Makefile                    |  18 ++
 arch/x86/um/asm/elf.h                   |  11 +-
 arch/x86/um/asm/module.h                |  24 ---
 arch/x86/um/asm/processor.h             |  12 ++
 arch/x86/um/do_syscall_64.c             | 108 ++++++++++++
 arch/x86/um/entry_64.S                  | 108 ++++++++++++
 arch/x86/um/shared/sysdep/syscalls_64.h |   6 +
 arch/x86/um/signal.c                    |  37 +++-
 arch/x86/um/syscalls_64.c               |  69 ++++++++
 arch/x86/um/vdso/um_vdso.c              |  20 +++
 arch/x86/um/vdso/vma.c                  |  14 ++
 arch/x86/um/zpoline.c                   | 223 ++++++++++++++++++++++++
 fs/Kconfig.binfmt                       |   2 +-
 fs/binfmt_elf_fdpic.c                   |  10 ++
 43 files changed, 1262 insertions(+), 46 deletions(-)
 create mode 100644 Documentation/virt/uml/nommu-uml.rst
 create mode 100644 arch/um/configs/x86_64_nommu_defconfig
 create mode 100644 arch/um/os-Linux/cpu.c
 delete mode 100644 arch/x86/um/asm/module.h
 create mode 100644 arch/x86/um/do_syscall_64.c
 create mode 100644 arch/x86/um/entry_64.S
 create mode 100644 arch/x86/um/zpoline.c

Comments

Johannes Berg Nov. 15, 2024, 10:12 a.m. UTC | #1
On Mon, 2024-11-11 at 15:27 +0900, Hajime Tazaki wrote:
> This is a series of patches of nommu arch addition to UML.  It would
> be nice to ask comments/opinions on this.

So I've been thinking about this for a while now...

To be clear, I'm not really _against_ it. With around 1200 lines of
code, it really isn't even big. But I also don't know how brittle it is?
Testing it is made somewhat difficult with the map-at-zero requirement
too.


And really I keep coming back to asking myself what the use case is?

Is it to test something for no-MMU platforms more easily? But I'm not
sure what that would be? Have any no-MMU platform maintainers weighed in
on this, have they even _seen_ it? Is that interesting? Is it more
interesting than testing an emulated system with the right architecture?
With it this way you'd probably have to build the right libraries and
binaries for x86-64 no-MMU, does such a thing already exist somewhere?

It also doesn't look like it's meant to replace LKL? But even LKL I
don't really know - are people using it, and if so what for? Seems
lklfuse is a thing for some BSD folks?

Is there something else to use it for?

If it's the first (test no-MMU) then it probably should be smarter about
not really relying on retpoline. Why is the focus so much on that
anyway? If testing no-MMU was the most important thing then probably
you'd have started with seccomp, and actually execute the syscalls from
that, to not have all those restrictions that come from rewriting
binaries, rather than ignoring the whole thing. Though of course you did
add a filter now, but I think it'll just crash?
So I could perhaps see this use case, but then I'd probably think it
should be more generic (i.e. able to execute all no-MMU binaries
including ones that may be using JIT compilation etc.) and not _require_
retpoline, but rather use it as an optimisation where that's possible
(i.e. if you can map at zero)?

If the use case instead of more LKL-type usage, I guess I don't really
understand it, though to be honest I also don't really fully understand
LKL itself, but it always _seemed_ very different.

Somewhat hyperbolically, I'm wondering if it's just a tech demo for
retpoline?

So I dunno. Reading through it again there are a few minor things wrt.
code style and debug things left over, but it's not awful ;-) I'd also
prefer the code to be more clearly "marked" (as nommu), perhaps putting
new files into a nommu/ directory, or something like that. But that's
pretty minor.

Still it's in a lot of places and chances are it'll make bigger
refactoring (like seccomp mode) harder. Perhaps if at all it should come
after seccomp mode and use that to execute syscalls if zpoline can't be
done, and to catch all the cases where zpoline doesn't work (you have
that in the docs)?

What do others think? Would you use it? What for?

johannes
Anton Ivanov Nov. 15, 2024, 10:26 a.m. UTC | #2
On 15/11/2024 10:12, Johannes Berg wrote:
> On Mon, 2024-11-11 at 15:27 +0900, Hajime Tazaki wrote:
>> This is a series of patches of nommu arch addition to UML.  It would
>> be nice to ask comments/opinions on this.
> 
> So I've been thinking about this for a while now...
> 
> To be clear, I'm not really _against_ it. With around 1200 lines of
> code, it really isn't even big. But I also don't know how brittle it is?
> Testing it is made somewhat difficult with the map-at-zero requirement
> too.
> 
> 
> And really I keep coming back to asking myself what the use case is?
> 
> Is it to test something for no-MMU platforms more easily? But I'm not
> sure what that would be? Have any no-MMU platform maintainers weighed in
> on this, have they even _seen_ it? Is that interesting? Is it more
> interesting than testing an emulated system with the right architecture?
> With it this way you'd probably have to build the right libraries and
> binaries for x86-64 no-MMU, does such a thing already exist somewhere?
> 
> It also doesn't look like it's meant to replace LKL? But even LKL I
> don't really know - are people using it, and if so what for? Seems
> lklfuse is a thing for some BSD folks?
> 
> Is there something else to use it for?
> 
> If it's the first (test no-MMU) then it probably should be smarter about
> not really relying on retpoline. Why is the focus so much on that
> anyway? If testing no-MMU was the most important thing then probably
> you'd have started with seccomp, and actually execute the syscalls from
> that, to not have all those restrictions that come from rewriting
> binaries, rather than ignoring the whole thing. Though of course you did
> add a filter now, but I think it'll just crash?
> So I could perhaps see this use case, but then I'd probably think it
> should be more generic (i.e. able to execute all no-MMU binaries
> including ones that may be using JIT compilation etc.) and not _require_
> retpoline, but rather use it as an optimisation where that's possible
> (i.e. if you can map at zero)?
> 
> If the use case instead of more LKL-type usage, I guess I don't really
> understand it, though to be honest I also don't really fully understand
> LKL itself, but it always _seemed_ very different.
> 
> Somewhat hyperbolically, I'm wondering if it's just a tech demo for
> retpoline?
> 
> So I dunno. Reading through it again there are a few minor things wrt.
> code style and debug things left over, but it's not awful ;-) I'd also
> prefer the code to be more clearly "marked" (as nommu), perhaps putting
> new files into a nommu/ directory, or something like that. But that's
> pretty minor.
> 
> Still it's in a lot of places and chances are it'll make bigger
> refactoring (like seccomp mode) harder. Perhaps if at all it should come
> after seccomp mode and use that to execute syscalls if zpoline can't be
> done, and to catch all the cases where zpoline doesn't work (you have
> that in the docs)?
> 
> What do others think? Would you use it? What for?

I always thought of it as "another LKL". In that case, it can be compared
to LKL on merit and if it is equivalent or better - go into kernel.

If there is another use case, I will be glad to hear it.

> 
> johannes
> 
>
Hajime Tazaki Nov. 15, 2024, 2:48 p.m. UTC | #3
Hello Johannes,

# added Geert, Greg, Rich to Cc (sorry if you feel noisy)
# here is the original email of this thread: just in case.
# https://lore.kernel.org/linux-um/cover.1731290567.git.thehajime@gmail.com/

On Fri, 15 Nov 2024 19:12:39 +0900,
Johannes Berg wrote:
> 
> On Mon, 2024-11-11 at 15:27 +0900, Hajime Tazaki wrote:
> > This is a series of patches of nommu arch addition to UML.  It would
> > be nice to ask comments/opinions on this.
> 
> So I've been thinking about this for a while now...

thank you for your time !

> To be clear, I'm not really _against_ it. With around 1200 lines of
> code, it really isn't even big. But I also don't know how brittle it is?
> Testing it is made somewhat difficult with the map-at-zero requirement
> too.

Given the recent situation that CI/testing facilities running are on
VMs, configuring /proc/sys/vm/mmap_min_addr=0 is not so difficult
in order to test this feature.

> And really I keep coming back to asking myself what the use case is?
> 
> Is it to test something for no-MMU platforms more easily? But I'm not
> sure what that would be? Have any no-MMU platform maintainers weighed in
> on this, have they even _seen_ it? Is that interesting? Is it more
> interesting than testing an emulated system with the right architecture?

Let me explain one recent experience for the use case.

I spotted (and fixed, now in linus tree) an issue of vma subsystem
using the maple-tree library, during this development of patch series.

There is a (slightly) long thread here to discuss with the maple-tree
maintainer, Liam (below).

- traversing vma on nommu
https://lists.infradead.org/pipermail/maple-tree/2024-November/003740.html

The issue was bisected that I can reproduce it after v6.12-rc1, but
never happened with the other nommu arch (we tested with m68k and
riscv, both on buildroot qemu).  maybe because I'm familiar with nommu
UML than m68k/riscv qemu, I could comfortably reproduce/debug/test
what's going on with gdb, and finally proposed a fix (one-liner
patch).

- the patch (hope it'll be landed on 6.12 release)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=247d720b2c5d22f7281437fd6054a138256986ba

This is only a case of usefulness.  I believe you can also imagine
that this also can happen with regular (MMU) UML.

I also privately run a CI test which verifies that my patch doesn't
break MMU UML, with a simple boot test (static/dynamic), 12 kunit
tests in kernel tree, basic benchmarks with lmbench, etc.  This is not
specific characteristics of nommu UML though.

https://github.com/thehajime/linux/actions/runs/11811327291
# The above URL may expire in future.


> With it this way you'd probably have to build the right libraries and
> binaries for x86-64 no-MMU, does such a thing already exist somewhere?

I'm preparing the patches to upstream Alpine Linux for such binaries
to be available in an appropriate way.  Note that I didn't modify the
code of programs itself (except a clear bug), just build with NOMMU
option which is already implemented in busybox/musl-libc.

https://gitlab.alpinelinux.org/thehajime/aports/-/merge_requests/2/diffs

I have not contacted to the upstream developer so, this diff might be changed.

> It also doesn't look like it's meant to replace LKL? But even LKL I
> don't really know - are people using it, and if so what for? Seems
> lklfuse is a thing for some BSD folks?
> 
> Is there something else to use it for?

This patchset is independent and nothing related to LKL.
# you may confuse that I've still been working on LKL.

(off topic)
lklsue is indeed used by FreeBSD but not well maintained (afaik).
NixOS (a linux pkg manager) also use lklfuse iirc.


> If it's the first (test no-MMU) then it probably should be smarter about
> not really relying on retpoline.

# I assume s/retpoline/zpoline/ in the rest of your message.

> Why is the focus so much on that
> anyway? If testing no-MMU was the most important thing then probably
> you'd have started with seccomp, and actually execute the syscalls from
> that, to not have all those restrictions that come from rewriting
> binaries, rather than ignoring the whole thing.

For the JIT part (and also syscalls from dlopen-ed binaries), as I
mentioned in the other reply, it can be implemented but not yet for
now.

The choice of zpoline is based on the speed of syscall invocations.
We have investigated that seccomp (and similar mechanism like SUD:
syscall user dispatch, ptrace, int3 signaling) are still slower than
binary rewrites, as the nature of signal delivery in its mechanism.
LD_PRELOAD with symbol rewrites is faster (even than binary rewrites)
but fundamentally cannot hook all syscalls.

zpoline tries to fill this gap, and we thought this fits the UML
usage.

> Though of course you did
> add a filter now, but I think it'll just crash?

this part (just crash w/ SIGSYS) can be improved.

> So I could perhaps see this use case, but then I'd probably think it
> should be more generic (i.e. able to execute all no-MMU binaries
> including ones that may be using JIT compilation etc.) and not _require_
> retpoline, but rather use it as an optimisation where that's possible
> (i.e. if you can map at zero)?

I understand your point.

> If the use case instead of more LKL-type usage, I guess I don't really
> understand it, though to be honest I also don't really fully understand
> LKL itself, but it always _seemed_ very different.

I didn't explain the comparison between LKL v.s. nommu UML, as I
thought those are independent from each other.

> Somewhat hyperbolically, I'm wondering if it's just a tech demo for
> retpoline?

Additional reason we used zpoline to replace syscall instruction is:

our first implementation of this nommu UML used modified version of
(userspace) standard library (musl-libc), without zpoline.  We
reimplemented syscall wrappers to call a syscall entry point
(__kernel_vsyscall) exposed by ELF aux vector.

Like this:

static __inline long __syscall0(long n)
{
	unsigned long ret = -1;
        __asm__ __volatile__ ("call *%1" : "=a"(ret)
			: "r"(__sysinfo), "a"(n)
			: "rcx", "r11", "memory");
	return ret;
}
# __sysinfo is exposed address from the aux vector.
# this was actually done not by myself, but Ricardo (in Cc)'s work.

https://github.com/nabla-containers/musl-libc/blob/e11be13e6abc06f7034d6b98552b5928d0ed0dfe/arch/x86_64/syscall_arch.h#L13-L20

with that, we can use unmodified binaries, but need to modify libc.so
and ld.so, which isn't trivial I thought.

My motivation to apply zpoline here is to eliminate this dependency;
with zpoline, we don't have to modify the standard library (musl).

In addition to that, since NOMMU kernel shares address space among
multiple userspace processes, we only have to prepare a trampoline
code a single time, while processes in multiple address space model
(in MMU case) needs to install those zpoline related code per each
process invocation.  This is not direct motivation to use zpoline
here, but side-benefit under the given environment.

> So I dunno. Reading through it again there are a few minor things wrt.
> code style and debug things left over, but it's not awful ;-)

oh really.  I'll double check them but would be nice to know any flaws
you found.

> I'd also
> prefer the code to be more clearly "marked" (as nommu), perhaps putting
> new files into a nommu/ directory, or something like that. But that's
> pretty minor.

I understand.  I'm afraid that it will be still multiple of ifdefs since
nommu UML relies on various part of existing UML infrastructure.

> Still it's in a lot of places and chances are it'll make bigger
> refactoring (like seccomp mode) harder. Perhaps if at all it should come
> after seccomp mode and use that to execute syscalls if zpoline can't be
> done, and to catch all the cases where zpoline doesn't work (you have
> that in the docs)?

fallback mechanism after zpoline failure might be interesting.

> What do others think? Would you use it? What for?

-- Hajime
Hajime Tazaki Nov. 15, 2024, 2:54 p.m. UTC | #4
Hello Anton,

thanks for the comment.

On Fri, 15 Nov 2024 19:26:07 +0900,
Anton Ivanov wrote:

> > What do others think? Would you use it? What for?
> 
> I always thought of it as "another LKL". In that case, it can be compared
> to LKL on merit and if it is equivalent or better - go into kernel.
> 
> If there is another use case, I will be glad to hear it.

In a high-level view,

the usage is different (no merit/demerit).
LKL is used with userspace binaries, linked with, or dynamically
replaced with the liblinux.so.  LKL has a userspace API derived from
syscall interface, which can be used to bridge LKL-world and
host-kernel world (not specific to Linux host).

This patchset (nommu UML) doesn't change the usage of current UML.

In an internal implementation point of view,

both (LKL and nommu-UML) uses !MMU.  While LKL can be implemented with
MMU-full configuration, we found (the last patch was back in 2021)
that it is not trivial.

LKL has no process model, currently only runs in a single (LKL)
process.  no vfork(2) support.
nommu-UML can host multiple processes with vfork available.

the patch size is:
LKL (last v8 patch): mostly 5k lines of modifications
nommu-UML: 1.2k lines of mods.


I think it looks like similar (as I'm from LKL which also uses !MMU),
but different from various aspects.

let me know if you wish to see more about the comparison.

-- Hajime
Lorenzo Stoakes Nov. 22, 2024, 9:33 a.m. UTC | #5
+ VMA people, mm list

On Mon, Nov 11, 2024 at 03:27:00PM +0900, Hajime Tazaki wrote:
> This is a series of patches of nommu arch addition to UML.  It would
> be nice to ask comments/opinions on this.

In general, while I appreciate your work and don't mean to be negative, we
in mm consistently have problems with nommu as it is a rarely-tested
more-or-less hack used for very few very old architectures and a constant
source of problems and maintenance overhead for us.

It also complicates mm code and time taken to develop new features.

So ideally we'd avoid doing anything that requires us maintain it going
forward unless the benefits really overwhelmingly outweigh the drawbacks.

There have been various murmourings about moving towards elimination of
nommu, obviously this would entirely prevent that.

Thanks, Lorenzo

>
> There are still several limitations/issues which we already found;
> here is the list of those issues.
>
> - prompt configured with /etc/profile is broken (variables are not
>   expanded, ${HOSTNAME%%.*}:$PWD#)
> - there are no mechanism implemented to cache for mapped memory of
>   exec(2) thus, always read files from filesystem upon every exec,
>   which makes slow on some benchmark (lmbench).
>
> -- Hajime
>
>
> RFC v2:
> - base branch is now uml/linux.git instead of torvalds/linux.git.
> - reorganize the patch series to clean up
> - fixed various coding styles issues
> - clean up exec code path [07/13]
> - fixed the crash/SIGSEGV case on userspace programs [10/13]
> - add seccomp filter to limit syscall caller address [06/13]
> - detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
> - removes unrelated changes
> - removes unneeded ifndef CONFIG_MMU
> - convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
> - proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
>   https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/
>
> RFC:
> - https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/
>
> Hajime Tazaki (13):
>   fs: binfmt_elf_efpic: add architecture hook elf_arch_finalize_exec
>   x86/um: nommu: elf loader for fdpic
>   um: nommu: memory handling
>   x86/um: nommu: syscall handling
>   x86/um: nommu: syscall translation by zpoline
>   um: nommu: prevent host syscalls from userspace by seccomp filter
>   x86/um: nommu: process/thread handling
>   um: nommu: configure fs register on host syscall invocation
>   x86/um/vdso: nommu: vdso memory update
>   x86/um: nommu: signal handling
>   um: change machine name for uname output
>   um: nommu: add documentation of nommu UML
>   um: nommu: plug nommu code into build system
>
>  Documentation/virt/uml/nommu-uml.rst    | 221 +++++++++++++++++++++++
>  arch/um/Kconfig                         |  14 +-
>  arch/um/Makefile                        |   6 +
>  arch/um/configs/x86_64_nommu_defconfig  |  64 +++++++
>  arch/um/include/asm/Kbuild              |   1 +
>  arch/um/include/asm/futex.h             |   4 +
>  arch/um/include/asm/mmu.h               |   8 +
>  arch/um/include/asm/mmu_context.h       |  13 +-
>  arch/um/include/asm/ptrace-generic.h    |   6 +
>  arch/um/include/asm/tlbflush.h          |  22 +++
>  arch/um/include/asm/uaccess.h           |   7 +-
>  arch/um/include/shared/kern_util.h      |   3 +
>  arch/um/include/shared/os.h             |  14 ++
>  arch/um/kernel/Makefile                 |   3 +-
>  arch/um/kernel/mem.c                    |  12 +-
>  arch/um/kernel/physmem.c                |   6 +
>  arch/um/kernel/process.c                |  33 +++-
>  arch/um/kernel/skas/Makefile            |   4 +-
>  arch/um/kernel/trap.c                   |  14 ++
>  arch/um/kernel/um_arch.c                |   4 +
>  arch/um/os-Linux/Makefile               |   5 +-
>  arch/um/os-Linux/cpu.c                  |  50 ++++++
>  arch/um/os-Linux/internal.h             |   5 +
>  arch/um/os-Linux/main.c                 |   5 +
>  arch/um/os-Linux/process.c              |  94 +++++++++-
>  arch/um/os-Linux/signal.c               |  18 +-
>  arch/um/os-Linux/skas/process.c         |   4 +
>  arch/um/os-Linux/start_up.c             |   3 +
>  arch/um/os-Linux/util.c                 |   3 +-
>  arch/x86/um/Makefile                    |  18 ++
>  arch/x86/um/asm/elf.h                   |  11 +-
>  arch/x86/um/asm/module.h                |  24 ---
>  arch/x86/um/asm/processor.h             |  12 ++
>  arch/x86/um/do_syscall_64.c             | 108 ++++++++++++
>  arch/x86/um/entry_64.S                  | 108 ++++++++++++
>  arch/x86/um/shared/sysdep/syscalls_64.h |   6 +
>  arch/x86/um/signal.c                    |  37 +++-
>  arch/x86/um/syscalls_64.c               |  69 ++++++++
>  arch/x86/um/vdso/um_vdso.c              |  20 +++
>  arch/x86/um/vdso/vma.c                  |  14 ++
>  arch/x86/um/zpoline.c                   | 223 ++++++++++++++++++++++++
>  fs/Kconfig.binfmt                       |   2 +-
>  fs/binfmt_elf_fdpic.c                   |  10 ++
>  43 files changed, 1262 insertions(+), 46 deletions(-)
>  create mode 100644 Documentation/virt/uml/nommu-uml.rst
>  create mode 100644 arch/um/configs/x86_64_nommu_defconfig
>  create mode 100644 arch/um/os-Linux/cpu.c
>  delete mode 100644 arch/x86/um/asm/module.h
>  create mode 100644 arch/x86/um/do_syscall_64.c
>  create mode 100644 arch/x86/um/entry_64.S
>  create mode 100644 arch/x86/um/zpoline.c
>
> --
> 2.43.0
>
>
Johannes Berg Nov. 22, 2024, 9:53 a.m. UTC | #6
On Fri, 2024-11-22 at 09:33 +0000, Lorenzo Stoakes wrote:
> 
> In general, while I appreciate your work and don't mean to be negative, we
> in mm consistently have problems with nommu as it is a rarely-tested
> more-or-less hack used for very few very old architectures and a constant
> source of problems and maintenance overhead for us.
> 
> It also complicates mm code and time taken to develop new features.
> 
> So ideally we'd avoid doing anything that requires us maintain it going
> forward unless the benefits really overwhelmingly outweigh the drawbacks.

:)

There aren't really any benefits to ARCH=um in *itself*, IMHO.

> There have been various murmourings about moving towards elimination of
> nommu, obviously this would entirely prevent that.

No objection from me, but e.g. RISC-V added nommu somewhat recently?
(+Christoph, Damien)

So we could argue the other way around and say that while we have other
architectures with nommu (like RISC-V), having ARCH=um could simplify
testing by e.g. allowing a kunit configuration in ARCH=um which is
simpler (and probably faster) to run for most people than simulating a
foreign architecture.

Anyway, I think that's where I am with my partial (and very limited)
ARCH=um maintainer role. I don't really care for having the feature in
UML itself, but if it's useful for testing nommu architectures for
someone else, it doesn't seem too problematic to support. And testing
such things is also a big part of the argument Hajime was making,
afaict.

johannes
Lorenzo Stoakes Nov. 22, 2024, 10:29 a.m. UTC | #7
On Fri, Nov 22, 2024 at 10:53:18AM +0100, Johannes Berg wrote:
> On Fri, 2024-11-22 at 09:33 +0000, Lorenzo Stoakes wrote:
> >
> > In general, while I appreciate your work and don't mean to be negative, we
> > in mm consistently have problems with nommu as it is a rarely-tested
> > more-or-less hack used for very few very old architectures and a constant
> > source of problems and maintenance overhead for us.
> >
> > It also complicates mm code and time taken to develop new features.
> >
> > So ideally we'd avoid doing anything that requires us maintain it going
> > forward unless the benefits really overwhelmingly outweigh the drawbacks.
>
> :)
>
> There aren't really any benefits to ARCH=um in *itself*, IMHO.
>
> > There have been various murmourings about moving towards elimination of
> > nommu, obviously this would entirely prevent that.
>
> No objection from me, but e.g. RISC-V added nommu somewhat recently?
> (+Christoph, Damien)

I mean it's not my place to object to this of course, but ideally we'd
avoid supporting the truly low spec RISC-V arches which do not have MMUs (I
wasn't aware there were some but I am wholly unfamiliar with RISC-V so
plead ignorance!)

>
> So we could argue the other way around and say that while we have other
> architectures with nommu (like RISC-V), having ARCH=um could simplify
> testing by e.g. allowing a kunit configuration in ARCH=um which is
> simpler (and probably faster) to run for most people than simulating a
> foreign architecture.

Yeah and this is the flip side of the coin, I mean it's actually very
useful to be able to test nommu stuff easily (I've had real issues getting
nommu m68k working in qemu for instance), but my concern is by adding more
dependency on this mechanism it makes it harder to remove later.

I would support this if in future there wouldn't be too much objection to
_this_ feature being removed should we come to a point where nommu removal
happens.

If a large part of the motivation is testing nommu arches, and we at some
point eliminate them, then I think hopefully given this would in that case
be the raison d'etre for the effort it'd not be too egregious to remove at
this point.

In which case, the flip side of the coin is that I am in fact positive
about the testing possibilities here :)

>
> Anyway, I think that's where I am with my partial (and very limited)
> ARCH=um maintainer role. I don't really care for having the feature in
> UML itself, but if it's useful for testing nommu architectures for
> someone else, it doesn't seem too problematic to support. And testing
> such things is also a big part of the argument Hajime was making,
> afaict.
>
> johannes
>

Thanks, and again I don't mean to be negative or difficult about this
series, I just want to raise the fact that 'in the wind' so to speak there
is desire to eliminate nommu at some point.

How realistic that desire is, I am not sure...

Cheers, Lorenzo
Christoph Hellwig Nov. 22, 2024, 12:18 p.m. UTC | #8
Maybe I'm missing something, but where does this discussion about
killing nommu even come from?  Nommu is a long standing and reasonable
well maintained part of the kernel, why would anyone want to kill it
for no good reason?  I know quite a lot of products shipping it.

Btw, nommu UML certainly sounds interesting to me, at least indirectly.
I have a project for next year or so for which the linux kernel library
or something like it would be useful to run an in-kernel workload as
a user space process if needed.  nommu uml sounds like a really good
base for that as there basically won't be any userspace that needs
memory protection to start with.
Lorenzo Stoakes Nov. 22, 2024, 12:25 p.m. UTC | #9
On Fri, Nov 22, 2024 at 01:18:26PM +0100, Christoph Hellwig wrote:
> Maybe I'm missing something, but where does this discussion about
> killing nommu even come from?  Nommu is a long standing and reasonable
> well maintained part of the kernel, why would anyone want to kill it
> for no good reason?  I know quite a lot of products shipping it.

It's an ongoing maintenance burden, discussions about seeing whether it's
feasible to remove it have been had in multiple places.

I have personally run into issues having to accommodate it on numerous
occasions, as have many others.

I'd be interested to know which products specifically ship this and also
require tip kernel, perhaps this is just a case of my not being aware of
certain architectures?

My impression was that only legacy architectures specifically needed this,
but I'm happy to stand corrected.

Discussion which prompted this is specifically around m68k over at [0].

[0]:https://lore.kernel.org/all/9be80a9f-1587-4e8a-98cb-edf4920e587e@lucifer.local/

>
> Btw, nommu UML certainly sounds interesting to me, at least indirectly.
> I have a project for next year or so for which the linux kernel library
> or something like it would be useful to run an in-kernel workload as
> a user space process if needed.  nommu uml sounds like a really good
> base for that as there basically won't be any userspace that needs
> memory protection to start with.
>
>
Christoph Hellwig Nov. 22, 2024, 12:38 p.m. UTC | #10
On Fri, Nov 22, 2024 at 12:25:19PM +0000, Lorenzo Stoakes wrote:
> It's an ongoing maintenance burden, discussions about seeing whether it's
> feasible to remove it have been had in multiple places.
> 
> I have personally run into issues having to accommodate it on numerous
> occasions, as have many others.
> 
> I'd be interested to know which products specifically ship this and also
> require tip kernel, perhaps this is just a case of my not being aware of
> certain architectures?

I can't tell you the products I know on commercial basis.  Most of them
are arm based, but I also know about at least one RISC-V one.    They
all used the latest long term stable at the time of release and tend
to stay on that.  And the involved vendors keep spinning out new versions
of these every few years.
Damien Le Moal Nov. 22, 2024, 12:49 p.m. UTC | #11
On 11/22/24 21:38, Christoph Hellwig wrote:
> On Fri, Nov 22, 2024 at 12:25:19PM +0000, Lorenzo Stoakes wrote:
>> It's an ongoing maintenance burden, discussions about seeing whether it's
>> feasible to remove it have been had in multiple places.
>>
>> I have personally run into issues having to accommodate it on numerous
>> occasions, as have many others.
>>
>> I'd be interested to know which products specifically ship this and also
>> require tip kernel, perhaps this is just a case of my not being aware of
>> certain architectures?
> 
> I can't tell you the products I know on commercial basis.  Most of them
> are arm based, but I also know about at least one RISC-V one.    They
> all used the latest long term stable at the time of release and tend
> to stay on that.  And the involved vendors keep spinning out new versions
> of these every few years.

To add to this, we had a discussion at the RISC-V MC at plumbers last year (I
think it was) about removing the K210 RISC-V SoC and associated RISC-V NOMMU
support. But several people complained about that because several FPGAs
implementing RISC-V cores are NOMMU (for obvious reasons for the FPGA case). So
NOMMU is being used out there.
Lorenzo Stoakes Nov. 22, 2024, 12:52 p.m. UTC | #12
On Fri, Nov 22, 2024 at 12:49:45PM +0000, Damien Le Moal wrote:
> On 11/22/24 21:38, Christoph Hellwig wrote:
> > On Fri, Nov 22, 2024 at 12:25:19PM +0000, Lorenzo Stoakes wrote:
> >> It's an ongoing maintenance burden, discussions about seeing whether it's
> >> feasible to remove it have been had in multiple places.
> >>
> >> I have personally run into issues having to accommodate it on numerous
> >> occasions, as have many others.
> >>
> >> I'd be interested to know which products specifically ship this and also
> >> require tip kernel, perhaps this is just a case of my not being aware of
> >> certain architectures?
> >
> > I can't tell you the products I know on commercial basis.  Most of them
> > are arm based, but I also know about at least one RISC-V one.    They
> > all used the latest long term stable at the time of release and tend
> > to stay on that.  And the involved vendors keep spinning out new versions
> > of these every few years.
>
> To add to this, we had a discussion at the RISC-V MC at plumbers last year (I
> think it was) about removing the K210 RISC-V SoC and associated RISC-V NOMMU
> support. But several people complained about that because several FPGAs
> implementing RISC-V cores are NOMMU (for obvious reasons for the FPGA case). So
> NOMMU is being used out there.

Thanks guys, appreciate the input, and this has made me aware of things I
simply was not before.

In that case, I am actually rather in favour of this series to make it
easier to test nommu things :)

>
> --
> Damien Le Moal
> Western Digital Research
David Gow Nov. 23, 2024, 7:27 a.m. UTC | #13
On Mon, 11 Nov 2024 at 14:27, Hajime Tazaki <thehajime@gmail.com> wrote:
>
> This is a series of patches of nommu arch addition to UML.  It would
> be nice to ask comments/opinions on this.
>
> There are still several limitations/issues which we already found;
> here is the list of those issues.
>
> - prompt configured with /etc/profile is broken (variables are not
>   expanded, ${HOSTNAME%%.*}:$PWD#)
> - there are no mechanism implemented to cache for mapped memory of
>   exec(2) thus, always read files from filesystem upon every exec,
>   which makes slow on some benchmark (lmbench).
>
> -- Hajime
>

Thanks for sending this in!

I had a chance to give this a proper try with KUnit, and think it'd be
a great options to have available: it's certainly nice to have a fast,
easy nommu architecture for testing.

I'd echo the comments from others that — at least for the testing case
— it doesn't make much sense to go to the length to use the fancy
zpoline patching (as neat as it is) compared to a simpler, but slower
seccomp-based approach. It'd be nicer to have a simpler, more robust
implementation first, and if there's a particular reason to want to
speed it up later, zpoline can be added as an option.

Plus, if we can avoid the need for vm.mmap_min_addr, that'd make it
much easier to run the nommu tests alongside all the regular UML ones,
as none would require either root, or an otherwise particularly
special config.

Cheers,
-- David

>
> RFC v2:
> - base branch is now uml/linux.git instead of torvalds/linux.git.
> - reorganize the patch series to clean up
> - fixed various coding styles issues
> - clean up exec code path [07/13]
> - fixed the crash/SIGSEGV case on userspace programs [10/13]
> - add seccomp filter to limit syscall caller address [06/13]
> - detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
> - removes unrelated changes
> - removes unneeded ifndef CONFIG_MMU
> - convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
> - proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
>   https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/
>
> RFC:
> - https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/
>
Hajime Tazaki Nov. 24, 2024, 1:25 a.m. UTC | #14
Hello David,

On Sat, 23 Nov 2024 16:27:27 +0900,
David Gow wrote:

> I had a chance to give this a proper try with KUnit, and think it'd be
> a great options to have available: it's certainly nice to have a fast,
> easy nommu architecture for testing.

thanks for the test.

> I'd echo the comments from others that ― at least for the testing case
> ― it doesn't make much sense to go to the length to use the fancy
> zpoline patching (as neat as it is) compared to a simpler, but slower
> seccomp-based approach. It'd be nicer to have a simpler, more robust
> implementation first, and if there's a particular reason to want to
> speed it up later, zpoline can be added as an option.

I'll start to explore the possibility of this option under nommu; will
get you guys back here.

> Plus, if we can avoid the need for vm.mmap_min_addr, that'd make it
> much easier to run the nommu tests alongside all the regular UML ones,
> as none would require either root, or an otherwise particularly
> special config.

Though I thought this limitation doesn't have much impact, we'll also
experiment if this (not using mmap_min_addr) is possible or not.


thanks for the feedback !

-- Hajime