Message ID | 20131008004224.509422315@amt.cnet |
---|---|
State | New |
Headers | show |
Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: > + /* unblock SIGBUS */ > + pthread_sigmask(SIG_BLOCK, NULL, &oldset); > + sigemptyset(&set); > + sigaddset(&set, SIGBUS); > + pthread_sigmask(SIG_UNBLOCK, &set, NULL); Please instead modify qemu-thread-posix.c to unblock all per-thread signals (SIGBUS, SIGSEGV, SIGILL, SIGFPE and SIGSYS). There is no need to keep those blocked. Paolo
On Tue, Oct 08, 2013 at 10:03:48AM +0200, Paolo Bonzini wrote: > Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: > > + /* unblock SIGBUS */ > > + pthread_sigmask(SIG_BLOCK, NULL, &oldset); > > + sigemptyset(&set); > > + sigaddset(&set, SIGBUS); > > + pthread_sigmask(SIG_UNBLOCK, &set, NULL); > > Please instead modify qemu-thread-posix.c to unblock all per-thread > signals (SIGBUS, SIGSEGV, SIGILL, SIGFPE and SIGSYS). There is no need > to keep those blocked. > > Paolo main-loop.c handles SIGBUS via signalfd to emulate MCEs (associated commits). Therefore it must be blocked. Note that what this patch does it to maintain the signal handling state (it saves the previous state, modifies state, restores previous state) so that its unchanged.
Il 08/10/2013 23:51, Marcelo Tosatti ha scritto: > On Tue, Oct 08, 2013 at 10:03:48AM +0200, Paolo Bonzini wrote: >> Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: >>> + /* unblock SIGBUS */ >>> + pthread_sigmask(SIG_BLOCK, NULL, &oldset); >>> + sigemptyset(&set); >>> + sigaddset(&set, SIGBUS); >>> + pthread_sigmask(SIG_UNBLOCK, &set, NULL); >> >> Please instead modify qemu-thread-posix.c to unblock all per-thread >> signals (SIGBUS, SIGSEGV, SIGILL, SIGFPE and SIGSYS). There is no need >> to keep those blocked. > > main-loop.c handles SIGBUS via signalfd to emulate MCEs (associated > commits). Therefore it must be blocked. How was that tested? For BUS_MCEERR_AO it can work, but BUS_MCEERR_AR calls force_sig_info which does this: ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); if (blocked || ignored) { action->sa.sa_handler = SIG_DFL; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t); } } if (action->sa.sa_handler == SIG_DFL) t->signal->flags &= ~SIGNAL_UNKILLABLE; and kills the process (because that's the default action of SIG_DFL). > Note that what this patch does it to maintain the signal handling state > (it saves the previous state, modifies state, restores previous state) so > that its unchanged. Yes, understood. I was missing the part about MCE (I knew it used SIGBUS, but forgot about signalfd). So this patch is good, but the above point about BUS_MCEERR_AR needs to be checked sooner or later. Paolo
On Wed, Oct 09, 2013 at 10:05:44AM +0200, Paolo Bonzini wrote: > Il 08/10/2013 23:51, Marcelo Tosatti ha scritto: > > On Tue, Oct 08, 2013 at 10:03:48AM +0200, Paolo Bonzini wrote: > >> Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: > >>> + /* unblock SIGBUS */ > >>> + pthread_sigmask(SIG_BLOCK, NULL, &oldset); > >>> + sigemptyset(&set); > >>> + sigaddset(&set, SIGBUS); > >>> + pthread_sigmask(SIG_UNBLOCK, &set, NULL); > >> > >> Please instead modify qemu-thread-posix.c to unblock all per-thread > >> signals (SIGBUS, SIGSEGV, SIGILL, SIGFPE and SIGSYS). There is no need > >> to keep those blocked. > > > > main-loop.c handles SIGBUS via signalfd to emulate MCEs (associated > > commits). Therefore it must be blocked. > > How was that tested? For BUS_MCEERR_AO it can work, but BUS_MCEERR_AR > calls force_sig_info which does this: > > ignored = action->sa.sa_handler == SIG_IGN; > blocked = sigismember(&t->blocked, sig); > if (blocked || ignored) { > action->sa.sa_handler = SIG_DFL; > if (blocked) { > sigdelset(&t->blocked, sig); > recalc_sigpending_and_wake(t); > } > > if (action->sa.sa_handler == SIG_DFL) > t->signal->flags &= ~SIGNAL_UNKILLABLE; > > and kills the process (because that's the default action of SIG_DFL). For vcpu context its not blocked? > > Note that what this patch does it to maintain the signal handling state > > (it saves the previous state, modifies state, restores previous state) so > > that its unchanged. > > Yes, understood. I was missing the part about MCE (I knew it used > SIGBUS, but forgot about signalfd). So this patch is good, but the > above point about BUS_MCEERR_AR needs to be checked sooner or later. > > Paolo
Il 09/10/2013 21:41, Marcelo Tosatti ha scritto: >> > How was that tested? For BUS_MCEERR_AO it can work, but BUS_MCEERR_AR >> > calls force_sig_info which does this: >> > >> > ignored = action->sa.sa_handler == SIG_IGN; >> > blocked = sigismember(&t->blocked, sig); >> > if (blocked || ignored) { >> > action->sa.sa_handler = SIG_DFL; >> > if (blocked) { >> > sigdelset(&t->blocked, sig); >> > recalc_sigpending_and_wake(t); >> > } >> > >> > if (action->sa.sa_handler == SIG_DFL) >> > t->signal->flags &= ~SIGNAL_UNKILLABLE; >> > >> > and kills the process (because that's the default action of SIG_DFL). > For vcpu context its not blocked? It causes KVM to exit back to userspace, but as soon as KVM exits it should be blocked. Thus a SIGBUS with BUS_MCEERR_AR will never be returned by sigtimedwait. Paolo
On Wed, Oct 09, 2013 at 11:26:58PM +0200, Paolo Bonzini wrote: > Il 09/10/2013 21:41, Marcelo Tosatti ha scritto: > >> > How was that tested? For BUS_MCEERR_AO it can work, but BUS_MCEERR_AR > >> > calls force_sig_info which does this: > >> > > >> > ignored = action->sa.sa_handler == SIG_IGN; > >> > blocked = sigismember(&t->blocked, sig); > >> > if (blocked || ignored) { > >> > action->sa.sa_handler = SIG_DFL; > >> > if (blocked) { > >> > sigdelset(&t->blocked, sig); > >> > recalc_sigpending_and_wake(t); > >> > } > >> > > >> > if (action->sa.sa_handler == SIG_DFL) > >> > t->signal->flags &= ~SIGNAL_UNKILLABLE; > >> > > >> > and kills the process (because that's the default action of SIG_DFL). > > For vcpu context its not blocked? > > It causes KVM to exit back to userspace, but as soon as KVM exits it > should be blocked. Thus a SIGBUS with BUS_MCEERR_AR will never be > returned by sigtimedwait. Its blocked but readable via signalfd. Its generated when vcpu touches memory, see 77db5cbd29b7cb0e0fb4fd14. Since its rarely used, reviewing the code is not a bad idea. For the test, see https://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg01588.html.
Il 09/10/2013 23:26, Paolo Bonzini ha scritto: > Il 09/10/2013 21:41, Marcelo Tosatti ha scritto: >>>> How was that tested? For BUS_MCEERR_AO it can work, but BUS_MCEERR_AR >>>> calls force_sig_info which does this: >>>> >>>> ignored = action->sa.sa_handler == SIG_IGN; >>>> blocked = sigismember(&t->blocked, sig); >>>> if (blocked || ignored) { >>>> action->sa.sa_handler = SIG_DFL; >>>> if (blocked) { >>>> sigdelset(&t->blocked, sig); >>>> recalc_sigpending_and_wake(t); >>>> } >>>> >>>> if (action->sa.sa_handler == SIG_DFL) >>>> t->signal->flags &= ~SIGNAL_UNKILLABLE; >>>> >>>> and kills the process (because that's the default action of SIG_DFL). >> For vcpu context its not blocked? > > It causes KVM to exit back to userspace, but as soon as KVM exits it > should be blocked. ... but it's been queued and this bypasses the checks in force_sig_info. So in guest mode it is accepted, in QEMU mode it causes a SIGBUS. Paolo
Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: > MAP_POPULATE mmap flag does not cause mmap to fail if allocation > of the entire area is not performed. HugeTLBfs performs reservation > of pages on a global basis: any further restriction to the reserved memory > such as cpusets placement or numa node policy is performed at > fault time only. > > Manually fault in pages at allocation time. This allows memory restrictions > to be applied before guest initialization. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: qemu/exec.c > =================================================================== > --- qemu.orig/exec.c > +++ qemu/exec.c Please produce individual patches with git format-patch. This lets "git am" do a 3-way merge, and would ignore automatically generated files such as qemu-options.def. Also: > @@ -918,6 +918,13 @@ static long gethugepagesize(const char * > return fs.f_bsize; > } > > +sigjmp_buf sigjump; Please make this static. > + > +static void sigbus_handler(int signal) > +{ > + siglongjmp(sigjump, 1); > +} > + > static void *file_ram_alloc(RAMBlock *block, > ram_addr_t memory, > const char *path) > @@ -927,9 +934,6 @@ static void *file_ram_alloc(RAMBlock *bl > char *c; > void *area; > int fd; > -#ifdef MAP_POPULATE > - int flags; > -#endif > unsigned long hpagesize; > > hpagesize = gethugepagesize(path); > @@ -977,21 +981,57 @@ static void *file_ram_alloc(RAMBlock *bl > if (ftruncate(fd, memory)) > perror("ftruncate"); > > -#ifdef MAP_POPULATE > - /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case > - * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED > - * to sidestep this quirk. > - */ > - flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE; > - area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0); > -#else > area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); > -#endif > if (area == MAP_FAILED) { > perror("file_ram_alloc: can't mmap RAM pages"); > close(fd); > return (NULL); > } > + > + if (mem_prealloc) { > + int ret, i; > + struct sigaction act, oldact; > + sigset_t set, oldset; > + > + memset(&act, 0, sizeof(act)); > + act.sa_handler = &sigbus_handler; > + act.sa_flags = 0; > + > + ret = sigaction(SIGBUS, &act, &oldact); > + if (ret) { > + perror("file_ram_alloc: fail to install signal handler"); > + exit(1); > + } > + > + /* unblock SIGBUS */ > + pthread_sigmask(SIG_BLOCK, NULL, &oldset); This is not needed, just pass &oldset in the SIG_UNBLOCK call below. > + sigemptyset(&set); > + sigaddset(&set, SIGBUS); > + pthread_sigmask(SIG_UNBLOCK, &set, NULL); > + > + if (sigsetjmp(sigjump, 1)) { > + fprintf(stderr, "file_ram_alloc: failed to preallocate pages\n"); > + exit(1); > + } > + > + /* MAP_POPULATE silently ignores failures */ > + for (i = 0; i < (memory/hpagesize)-1; i++) { > + memset(area + (hpagesize*i), 0, 1); > + } > + > + ret = sigaction(SIGBUS, &oldact, NULL); > + if (ret) { > + perror("file_ram_alloc: fail to reinstall signal handler"); > + exit(1); > + } > + > + if (sigismember(&oldset, SIGBUS)) { > + sigemptyset(&set); > + sigaddset(&set, SIGBUS); > + pthread_sigmask(SIG_BLOCK, &set, NULL); > + } Just use SIG_SETMASK with oldset, unconditionally. Ok with these changes. Paolo > + } > + > block->fd = fd; > return area; > } > Index: qemu/vl.c > =================================================================== > --- qemu.orig/vl.c > +++ qemu/vl.c > @@ -188,9 +188,7 @@ static int display_remote; > const char* keyboard_layout = NULL; > ram_addr_t ram_size; > const char *mem_path = NULL; > -#ifdef MAP_POPULATE > int mem_prealloc = 0; /* force preallocation of physical target memory */ > -#endif > int nb_nics; > NICInfo nd_table[MAX_NICS]; > int autostart; > @@ -3205,11 +3203,9 @@ int main(int argc, char **argv, char **e > case QEMU_OPTION_mempath: > mem_path = optarg; > break; > -#ifdef MAP_POPULATE > case QEMU_OPTION_mem_prealloc: > mem_prealloc = 1; > break; > -#endif > case QEMU_OPTION_d: > log_mask = optarg; > break; > Index: qemu/qemu-options.def > =================================================================== > --- qemu.orig/qemu-options.def > +++ qemu/qemu-options.def > @@ -66,11 +66,9 @@ stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ > DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath, > "-mem-path FILE provide backing storage for guest RAM\n", QEMU_ARCH_ALL) > > -#ifdef MAP_POPULATE > DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc, > "-mem-prealloc preallocate guest memory (use with -mem-path)\n", > QEMU_ARCH_ALL) > -#endif > > DEF("k", HAS_ARG, QEMU_OPTION_k, > "-k language use keyboard layout (for example 'fr' for French)\n", > Index: git/qemu/qemu-options.hx > =================================================================== > --- qemu.orig/qemu-options.hx > +++ qemu/qemu-options.hx > @@ -228,7 +228,6 @@ STEXI > Allocate guest RAM from a temporarily created file in @var{path}. > ETEXI > > -#ifdef MAP_POPULATE > DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc, > "-mem-prealloc preallocate guest memory (use with -mem-path)\n", > QEMU_ARCH_ALL) > @@ -237,7 +236,6 @@ STEXI > @findex -mem-prealloc > Preallocate memory when using -mem-path. > ETEXI > -#endif > > DEF("k", HAS_ARG, QEMU_OPTION_k, > "-k language use keyboard layout (for example 'fr' for French)\n", > > > >
On 8 October 2013 01:41, Marcelo Tosatti <mtosatti@redhat.com> wrote: > + ret = sigaction(SIGBUS, &oldact, NULL); > + if (ret) { > + perror("file_ram_alloc: fail to reinstall signal handler"); "failed". thanks -- PMM
Index: qemu/exec.c =================================================================== --- qemu.orig/exec.c +++ qemu/exec.c @@ -918,6 +918,13 @@ static long gethugepagesize(const char * return fs.f_bsize; } +sigjmp_buf sigjump; + +static void sigbus_handler(int signal) +{ + siglongjmp(sigjump, 1); +} + static void *file_ram_alloc(RAMBlock *block, ram_addr_t memory, const char *path) @@ -927,9 +934,6 @@ static void *file_ram_alloc(RAMBlock *bl char *c; void *area; int fd; -#ifdef MAP_POPULATE - int flags; -#endif unsigned long hpagesize; hpagesize = gethugepagesize(path); @@ -977,21 +981,57 @@ static void *file_ram_alloc(RAMBlock *bl if (ftruncate(fd, memory)) perror("ftruncate"); -#ifdef MAP_POPULATE - /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case - * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED - * to sidestep this quirk. - */ - flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE; - area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0); -#else area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); -#endif if (area == MAP_FAILED) { perror("file_ram_alloc: can't mmap RAM pages"); close(fd); return (NULL); } + + if (mem_prealloc) { + int ret, i; + struct sigaction act, oldact; + sigset_t set, oldset; + + memset(&act, 0, sizeof(act)); + act.sa_handler = &sigbus_handler; + act.sa_flags = 0; + + ret = sigaction(SIGBUS, &act, &oldact); + if (ret) { + perror("file_ram_alloc: fail to install signal handler"); + exit(1); + } + + /* unblock SIGBUS */ + pthread_sigmask(SIG_BLOCK, NULL, &oldset); + sigemptyset(&set); + sigaddset(&set, SIGBUS); + pthread_sigmask(SIG_UNBLOCK, &set, NULL); + + if (sigsetjmp(sigjump, 1)) { + fprintf(stderr, "file_ram_alloc: failed to preallocate pages\n"); + exit(1); + } + + /* MAP_POPULATE silently ignores failures */ + for (i = 0; i < (memory/hpagesize)-1; i++) { + memset(area + (hpagesize*i), 0, 1); + } + + ret = sigaction(SIGBUS, &oldact, NULL); + if (ret) { + perror("file_ram_alloc: fail to reinstall signal handler"); + exit(1); + } + + if (sigismember(&oldset, SIGBUS)) { + sigemptyset(&set); + sigaddset(&set, SIGBUS); + pthread_sigmask(SIG_BLOCK, &set, NULL); + } + } + block->fd = fd; return area; } Index: qemu/vl.c =================================================================== --- qemu.orig/vl.c +++ qemu/vl.c @@ -188,9 +188,7 @@ static int display_remote; const char* keyboard_layout = NULL; ram_addr_t ram_size; const char *mem_path = NULL; -#ifdef MAP_POPULATE int mem_prealloc = 0; /* force preallocation of physical target memory */ -#endif int nb_nics; NICInfo nd_table[MAX_NICS]; int autostart; @@ -3205,11 +3203,9 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_mempath: mem_path = optarg; break; -#ifdef MAP_POPULATE case QEMU_OPTION_mem_prealloc: mem_prealloc = 1; break; -#endif case QEMU_OPTION_d: log_mask = optarg; break; Index: qemu/qemu-options.def =================================================================== --- qemu.orig/qemu-options.def +++ qemu/qemu-options.def @@ -66,11 +66,9 @@ stringify(DEFAULT_RAM_SIZE) "]\n", QEMU_ DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath, "-mem-path FILE provide backing storage for guest RAM\n", QEMU_ARCH_ALL) -#ifdef MAP_POPULATE DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc, "-mem-prealloc preallocate guest memory (use with -mem-path)\n", QEMU_ARCH_ALL) -#endif DEF("k", HAS_ARG, QEMU_OPTION_k, "-k language use keyboard layout (for example 'fr' for French)\n", Index: git/qemu/qemu-options.hx =================================================================== --- qemu.orig/qemu-options.hx +++ qemu/qemu-options.hx @@ -228,7 +228,6 @@ STEXI Allocate guest RAM from a temporarily created file in @var{path}. ETEXI -#ifdef MAP_POPULATE DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc, "-mem-prealloc preallocate guest memory (use with -mem-path)\n", QEMU_ARCH_ALL) @@ -237,7 +236,6 @@ STEXI @findex -mem-prealloc Preallocate memory when using -mem-path. ETEXI -#endif DEF("k", HAS_ARG, QEMU_OPTION_k, "-k language use keyboard layout (for example 'fr' for French)\n",
MAP_POPULATE mmap flag does not cause mmap to fail if allocation of the entire area is not performed. HugeTLBfs performs reservation of pages on a global basis: any further restriction to the reserved memory such as cpusets placement or numa node policy is performed at fault time only. Manually fault in pages at allocation time. This allows memory restrictions to be applied before guest initialization. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>