Message ID | 20200612014606.147691-1-jkz@google.com |
---|---|
Headers | show |
Series | linux-user: Support extended clone(CLONE_VM) | expand |
Patchew URL: https://patchew.org/QEMU/20200612014606.147691-1-jkz@google.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === +++ /tmp/qemu-test/build/tests/qemu-iotests/041.out.bad 2020-06-12 03:41:14.015076859 +0000 @@ -1,5 +1,30 @@ -........................................................................................................ +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.YHsXXgjzyL/qemu-22509-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.YHsXXgjzyL/qemu-22509-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=base -drive if=ide,id=drive1,media=cdrom +......................................E................................................................. +====================================================================== +ERROR: test_pause (__main__.TestSingleBlockdev) +---------------------------------------------------------------------- +Traceback (most recent call last): + File "041", line 108, in test_pause --- Ran 104 tests -OK +FAILED (errors=1) TEST iotest-qcow2: 042 TEST iotest-qcow2: 043 TEST iotest-qcow2: 046 --- Not run: 259 Failures: 041 Failed 1 of 119 iotests make: *** [check-tests/check-block.sh] Error 1 Traceback (most recent call last): File "./tests/docker/docker.py", line 665, in <module> sys.exit(main()) --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=0b953bb4c5534a2f9f54bbb077c4a060', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-fy1xwr_z/src/docker-src.2020-06-11-23.31.14.29203:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=0b953bb4c5534a2f9f54bbb077c4a060 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fy1xwr_z/src' make: *** [docker-run-test-quick@centos7] Error 2 real 17m15.344s user 0m8.554s The full log is available at http://patchew.org/logs/20200612014606.147691-1-jkz@google.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Josh Kunz <jkz@google.com> writes: > This patch series implements extended support for the `clone` system > call. As best I can tell, any option combination including `CLONE_VM` > should be supported with the addition of this patch series. The > implementation is described in greater detail in the patches themselves. > > Testing: > > * All targets built on x86_64. > * `make check` and `make check-tcg` are passing. Additional tests have > been added to `linux-test.c` to validate clone behavior. > > Caveats: > > * This series touches, but does not fix, several bits of code that are > racey (namely the sigact table and the fd trans table). > * `exit_group` does not perform the appropriate cleanup for non-thread > children created with `CLONE_VM`. CPUs for such children are never > cleaned up. The correct implementation of exit-group is non-trivial > (since it also needs to track/handle cleanup for threads in the > clone'd child process). Also, I don't fully understand the > interaction between QOM<->linux-user. When the QOM object gets unrefed for the final time it should cause a bunch of clean-up in the common vCPU management code where things like plugin cleanup are done. This was recently touched in 1f81ce90e31ef338ee53a0cea02344237bc470cc where I removed linux-user messing around with the active cpu list and left it to the core code to deal with. Previously it wasn't being properly unrealized. > My naive implementation based > on the current implementation `exit(2)` was regularly crashing. If > maintainers have suggestions for better ways to handle exit_group, > they would be greatly appreciated. > * execve does not clean up the CPUs of clone'd children, for the same > reasons as `exit_group`. > > Josh Kunz (5): > linux-user: Refactor do_fork to use new `qemu_clone` > linux-user: Make fd_trans task-specific. > linux-user: Make sigact_table part of the task state. > linux-user: Support CLONE_VM and extended clone options > linux-user: Add PDEATHSIG test for clone process hierarchy. > > linux-user/Makefile.objs | 2 +- > linux-user/clone.c | 565 ++++++++++++++++++++++++++++ > linux-user/clone.h | 27 ++ > linux-user/fd-trans-tbl.c | 13 + > linux-user/fd-trans-type.h | 17 + > linux-user/fd-trans.c | 3 - > linux-user/fd-trans.h | 75 ++-- > linux-user/main.c | 1 + > linux-user/qemu.h | 49 +++ > linux-user/signal.c | 84 ++++- > linux-user/syscall.c | 452 ++++++++++++---------- > tests/tcg/multiarch/Makefile.target | 3 + > tests/tcg/multiarch/linux-test.c | 227 ++++++++++- > 13 files changed, 1264 insertions(+), 254 deletions(-) > create mode 100644 linux-user/clone.c > create mode 100644 linux-user/clone.h > create mode 100644 linux-user/fd-trans-tbl.c > create mode 100644 linux-user/fd-trans-type.h
Josh Kunz <jkz@google.com> writes: > This patch series implements extended support for the `clone` system > call. As best I can tell, any option combination including `CLONE_VM` > should be supported with the addition of this patch series. The > implementation is described in greater detail in the patches themselves. > > Testing: > > * All targets built on x86_64. > * `make check` and `make check-tcg` are passing. Additional tests have > been added to `linux-test.c` to validate clone behavior. Not for me - which probably means you don't have docker/podman setup or cross compilers for all the various guests we have. See tests/tcg/configure.sh > > Caveats: > > * This series touches, but does not fix, several bits of code that are > racey (namely the sigact table and the fd trans table). > * `exit_group` does not perform the appropriate cleanup for non-thread > children created with `CLONE_VM`. CPUs for such children are never > cleaned up. The correct implementation of exit-group is non-trivial > (since it also needs to track/handle cleanup for threads in the > clone'd child process). Also, I don't fully understand the > interaction between QOM<->linux-user. My naive implementation based > on the current implementation `exit(2)` was regularly crashing. If > maintainers have suggestions for better ways to handle exit_group, > they would be greatly appreciated. > * execve does not clean up the CPUs of clone'd children, for the same > reasons as `exit_group`. Apart from "a more perfect emulation" is there a particular use case served by the extra functionality? AIUI up until this point we've basically supported glibc's use of clone() which has generally been enough. I'm assuming you've come across stuff that needs this more fine grained support? > > Josh Kunz (5): > linux-user: Refactor do_fork to use new `qemu_clone` > linux-user: Make fd_trans task-specific. > linux-user: Make sigact_table part of the task state. > linux-user: Support CLONE_VM and extended clone options > linux-user: Add PDEATHSIG test for clone process hierarchy. > > linux-user/Makefile.objs | 2 +- > linux-user/clone.c | 565 ++++++++++++++++++++++++++++ > linux-user/clone.h | 27 ++ > linux-user/fd-trans-tbl.c | 13 + > linux-user/fd-trans-type.h | 17 + > linux-user/fd-trans.c | 3 - > linux-user/fd-trans.h | 75 ++-- > linux-user/main.c | 1 + > linux-user/qemu.h | 49 +++ > linux-user/signal.c | 84 ++++- > linux-user/syscall.c | 452 ++++++++++++---------- > tests/tcg/multiarch/Makefile.target | 3 + > tests/tcg/multiarch/linux-test.c | 227 ++++++++++- > 13 files changed, 1264 insertions(+), 254 deletions(-) > create mode 100644 linux-user/clone.c > create mode 100644 linux-user/clone.h > create mode 100644 linux-user/fd-trans-tbl.c > create mode 100644 linux-user/fd-trans-type.h
On Tue, 16 Jun 2020 at 17:08, Alex Bennée <alex.bennee@linaro.org> wrote: > Apart from "a more perfect emulation" is there a particular use case > served by the extra functionality? AIUI up until this point we've > basically supported glibc's use of clone() which has generally been > enough. I'm assuming you've come across stuff that needs this more fine > grained support? There are definitely cases we don't handle that cause problems; notably https://bugs.launchpad.net/qemu/+bug/1673976 reports that newer glibc implement posix_spawn() using CLONE_VM|CLONE_VFORK which we don't handle correctly (though it is now just "we don't report failures correctly" rather than "guest asserts"). The problem has always been that glibc implicitly assumes it knows what the process's threads are like, ie that it is the only thing doing any clone()s. (The comment in syscall.c mentions it "breaking mutexes" though I forget what I had in mind when I wrote that comment.) I haven't looked at these patches, but the risk of being clever is that we end up implicitly depending on details of glibc's internal implementation in a potentially fragile way. I forget whether QEMU can build against musl libc, but if we do then that might be an interesting test of whether we have accidental dependencies on the libc internals. thanks -- PMM
On Tue, Jun 16, 2020 at 9:32 AM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Tue, 16 Jun 2020 at 17:08, Alex Bennée <alex.bennee@linaro.org> wrote: > > Apart from "a more perfect emulation" is there a particular use case > > served by the extra functionality? AIUI up until this point we've > > basically supported glibc's use of clone() which has generally been > > enough. I'm assuming you've come across stuff that needs this more fine > > grained support? > > There are definitely cases we don't handle that cause problems; > notably https://bugs.launchpad.net/qemu/+bug/1673976 reports > that newer glibc implement posix_spawn() using CLONE_VM|CLONE_VFORK > which we don't handle correctly (though it is now just "we don't > report failures correctly" rather than "guest asserts"). This originally came up for us at Google in multi-threaded guest binaries using TCMalloc (https://github.com/google/tcmalloc). TCMalloc does not have any special `at_fork` handling, so guest processes using TCMalloc spawn subprocesses using a custom bit of code based on `clone(CLONE_VM)` (notably *not* vfork()). We've also been using this patch to work around similar issues in QEMU itself when creating subprocesses with fork()/vfork(). Since QEMU (and GLib) rely on locks to emulate multi-threaded guests that share virtual memory, QEMU itself is also vulnerable to deadlocks when a guest forks. Without this patch we've run into deadlocks with TCG region trees, and GLib's `g_malloc`, there are likely others as well, since we could still reproduce the deadlocks after fixing these specific problems. The issues caused by using fork() in multi-threaded guests are pretty tricky to debug. They are fundamentally data races (was another thread in the critical section or not?), and they usually manifest as deadlocks, which show up as timeouts or hangs to users. I suspect this issue happens frequently in the wild, but at a low enough rate/user that nobody bothered fixing it/reporting it yet. Use of `vfork()` with `CLONE_VM` is common as mentioned by Alex. For example it is the only way to spawn subprocesses in Go on most platforms: https://github.com/golang/go/blob/master/src/syscall/exec_linux.go#L218 I tried to come up with a good reproducer for these issues, but I haven't been able to make one. The cases we have at Google that trigger this issue reliably are big and they contain lots of code I can't share. When compiling QEMU itself with TCMalloc, I can pretty reliably reproduce a deadlock with a program that just calls vfork(), but that's somewhat synthetic. > The problem has always been that glibc implicitly assumes it > knows what the process's threads are like, ie that it is the > only thing doing any clone()s. (The comment in syscall.c mentions > it "breaking mutexes" though I forget what I had in mind when > I wrote that comment.) I haven't looked at these patches, > but the risk of being clever is that we end up implicitly > depending on details of glibc's internal implementation in a > potentially fragile way. > > > I forget whether QEMU can build against musl libc, but if we do > then that might be an interesting test of whether we have > accidental dependencies on the libc internals. I agree it would be interesting to test against musl. I'm pretty sure it would work (this patch only relies on POSIX APIs + Platform ABIs for TLS), but it would be interesting to confirm. -- Josh Kunz