Message ID | 20091008203740.GA20727@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > "posix-aio-compat: avoid signal race when spawning a thread" > winxp installation on a raw format file fails > during disk format, with a message "your > disk may be damaged". > > This commit moved signal mask from aio thread to creating thread. > It turns out if we keep the mask in aio thread as well, the problem > disappears. It should not be needed, but since this is harmless, let's > keep it around until someone inclined to debug pthread library internals > can check this issue. > > While we are at it, convert sigprocmask to pthread_sigmask > as per posix. Thanks for the work you've put into it, i'll try to see how all of this works on my x86 machine. [..snip..]
On Thu, Oct 08, 2009 at 10:37:40PM +0200, Michael S. Tsirkin wrote: > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > "posix-aio-compat: avoid signal race when spawning a thread" > winxp installation on a raw format file fails > during disk format, with a message "your > disk may be damaged". > > This commit moved signal mask from aio thread to creating thread. > It turns out if we keep the mask in aio thread as well, the problem > disappears. It should not be needed, but since this is harmless, let's > keep it around until someone inclined to debug pthread library internals > can check this issue. > > While we are at it, convert sigprocmask to pthread_sigmask > as per posix. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > For the benefit of whoever tries to debug this any deeper here's > documentation on how to reproduce the issue to be saved in git logs > (need to tweak paths obviously): > > The test was done on FC11, 32 bit userspace, 64 bit kernel 2.6.31. Vanilla kernel from FC11 does the same btw. > > ./configure --prefix=/scm/qemu-kvm-debug --target-list=x86_64-softmmu \ > --enable-kvm > make -j 8 && make install > rm /scm/images/winxp-bisect.raw > > /scm/qemu-kvm-debug/bin/qemu-img create -f raw \ > /scm/images/winxp-bisect.raw 2G > > /scm/qemu-kvm-debug/bin/qemu-system-x86_64 -enable-kvm -cdrom \ > /home/mst/en_windows_xp_professional_with_service_pack_3_x86_cd_x14-80428.iso \ > /scm/images/winxp-bisect.raw > > Select format to NTFS. > --- > > OK, I'm out of time with this issue, let's apply a work-around > until someone interested comes around? > > posix-aio-compat.c | 14 ++++++++++++-- > > diff --git a/posix-aio-compat.c b/posix-aio-compat.c > index 400d898..4abbe3a 100644 > --- a/posix-aio-compat.c > +++ b/posix-aio-compat.c > @@ -301,6 +301,16 @@ static size_t handle_aiocb_rw(struct qemu_paiocb *aiocb) > static void *aio_thread(void *unused) > { > pid_t pid; > + sigset_t set; > + > + /* block all signals */ > + /* Should not be necessary as we should inherit mask > + * from creating thread. However, without this, > + * on FC11, using raw file, WinXP installation fails during disk format > + * saying disk was damaged. pthread library bug? > + * */ > + if (sigfillset(&set)) die("sigfillset"); > + if (pthread_sigmask(SIG_BLOCK, &set, NULL)) die("pthread_sigmask"); By the way, things seem to work even when I comment out pthread_sigmask: just sigfillset does it. Seems pretty wild. > > pid = getpid(); > > @@ -371,11 +381,11 @@ static void spawn_thread(void) > > /* block all signals */ > if (sigfillset(&set)) die("sigfillset"); > - if (sigprocmask(SIG_SETMASK, &set, &oldset)) die("sigprocmask"); > + if (pthread_sigmask(SIG_SETMASK, &set, &oldset)) die("pthread_sigmask"); > > thread_create(&thread_id, &attr, aio_thread, NULL); > > - if (sigprocmask(SIG_SETMASK, &oldset, NULL)) die("sigprocmask restore"); > + if (pthread_sigmask(SIG_SETMASK, &oldset, NULL)) die("pthread_sigmask restore"); > } As I said before, just this second hunk alone does not help. > static void qemu_paio_submit(struct qemu_paiocb *aiocb) > -- > 1.6.5.rc2
On Fri, 9 Oct 2009, malc wrote: > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > > "posix-aio-compat: avoid signal race when spawning a thread" > > winxp installation on a raw format file fails > > during disk format, with a message "your > > disk may be damaged". > > > > This commit moved signal mask from aio thread to creating thread. > > It turns out if we keep the mask in aio thread as well, the problem > > disappears. It should not be needed, but since this is harmless, let's > > keep it around until someone inclined to debug pthread library internals > > can check this issue. > > > > While we are at it, convert sigprocmask to pthread_sigmask > > as per posix. > > Thanks for the work you've put into it, i'll try to see how all of this > works on my x86 machine. > > [..snip..] Nope, still can not reproduce... So indeed it appears someone is needed, who: a. Can reproduce b. Cares FWIW, in essence sigfillset is this: # define __sigfillset(set) \ (__extension__ ({ int __cnt = _SIGSET_NWORDS; \ sigset_t *__set = (set); \ while (--__cnt >= 0) __set->__val[__cnt] = ~0UL; \ 0; })) So the thing that works for you can be replaced with some dummy loop or something, IOW not good enough, we need to understand the problem, and as mentioned above, i can not be of any help here, can't repro, sorry.
On Fri, Oct 09, 2009 at 01:13:12AM +0400, malc wrote: > On Fri, 9 Oct 2009, malc wrote: > > > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > > > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > > > "posix-aio-compat: avoid signal race when spawning a thread" > > > winxp installation on a raw format file fails > > > during disk format, with a message "your > > > disk may be damaged". > > > > > > This commit moved signal mask from aio thread to creating thread. > > > It turns out if we keep the mask in aio thread as well, the problem > > > disappears. It should not be needed, but since this is harmless, let's > > > keep it around until someone inclined to debug pthread library internals > > > can check this issue. > > > > > > While we are at it, convert sigprocmask to pthread_sigmask > > > as per posix. > > > > Thanks for the work you've put into it, i'll try to see how all of this > > works on my x86 machine. > > > > [..snip..] > > Nope, still can not reproduce... So indeed it appears someone is needed, > who: > > a. Can reproduce > b. Cares > > FWIW, in essence sigfillset is this: > > # define __sigfillset(set) \ > (__extension__ ({ int __cnt = _SIGSET_NWORDS; \ > sigset_t *__set = (set); \ > while (--__cnt >= 0) __set->__val[__cnt] = ~0UL; \ > 0; })) > > So the thing that works for you can be replaced with some dummy loop > or something, Exactly. I just did this: sigset_t xset; static void *aio_thread(void *unused) { pid_t pid; sigset_t set; /* block all signals */ /* Should not be necessary as we should inherit mask * from creating thread. However, without this, * on FC11, WinXP installation fails during disk format * saying disk was damaged. pthread library bug? * */ memset(&set, 0, sizeof set); memcpy(&xset, &set, sizeof set); and it also helps. > IOW not good enough, we need to understand the problem, > and as mentioned above, i can not be of any help here, can't repro, > sorry. > > -- > mailto:av1474@comtv.ru
On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > On Fri, Oct 09, 2009 at 01:13:12AM +0400, malc wrote: > > On Fri, 9 Oct 2009, malc wrote: > > > > > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > > > > > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > > > > "posix-aio-compat: avoid signal race when spawning a thread" > > > > winxp installation on a raw format file fails > > > > during disk format, with a message "your > > > > disk may be damaged". > > > > > > > > This commit moved signal mask from aio thread to creating thread. > > > > It turns out if we keep the mask in aio thread as well, the problem > > > > disappears. It should not be needed, but since this is harmless, let's > > > > keep it around until someone inclined to debug pthread library internals > > > > can check this issue. > > > > > > > > While we are at it, convert sigprocmask to pthread_sigmask > > > > as per posix. > > > > > > Thanks for the work you've put into it, i'll try to see how all of this > > > works on my x86 machine. > > > > > > [..snip..] > > > > Nope, still can not reproduce... So indeed it appears someone is needed, > > who: > > > > a. Can reproduce > > b. Cares [..snip..] > > > > So the thing that works for you can be replaced with some dummy loop > > or something, > > Exactly. I just did this: > > sigset_t xset; > > static void *aio_thread(void *unused) > { > pid_t pid; > sigset_t set; > > /* block all signals */ > /* Should not be necessary as we should inherit mask > * from creating thread. However, without this, > * on FC11, WinXP installation fails during disk format > * saying disk was damaged. pthread library bug? > * */ > memset(&set, 0, sizeof set); > memcpy(&xset, &set, sizeof set); > > and it also helps. Now question of the week is - why does pausing the work that aio_thread does helps things? Come to think of it - your machine (judging by -j 8) is a lot faster than anything i have in my vicinity, and that might be why i can not reproduce it...
On Fri, Oct 09, 2009 at 02:17:13AM +0400, malc wrote: > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > On Fri, Oct 09, 2009 at 01:13:12AM +0400, malc wrote: > > > On Fri, 9 Oct 2009, malc wrote: > > > > > > > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > > > > > > > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > > > > > "posix-aio-compat: avoid signal race when spawning a thread" > > > > > winxp installation on a raw format file fails > > > > > during disk format, with a message "your > > > > > disk may be damaged". > > > > > > > > > > This commit moved signal mask from aio thread to creating thread. > > > > > It turns out if we keep the mask in aio thread as well, the problem > > > > > disappears. It should not be needed, but since this is harmless, let's > > > > > keep it around until someone inclined to debug pthread library internals > > > > > can check this issue. > > > > > > > > > > While we are at it, convert sigprocmask to pthread_sigmask > > > > > as per posix. > > > > > > > > Thanks for the work you've put into it, i'll try to see how all of this > > > > works on my x86 machine. > > > > > > > > [..snip..] > > > > > > Nope, still can not reproduce... So indeed it appears someone is needed, > > > who: > > > > > > a. Can reproduce > > > b. Cares > > [..snip..] > > > > > > > So the thing that works for you can be replaced with some dummy loop > > > or something, > > > > Exactly. I just did this: > > > > sigset_t xset; > > > > static void *aio_thread(void *unused) > > { > > pid_t pid; > > sigset_t set; > > > > /* block all signals */ > > /* Should not be necessary as we should inherit mask > > * from creating thread. However, without this, > > * on FC11, WinXP installation fails during disk format > > * saying disk was damaged. pthread library bug? > > * */ > > memset(&set, 0, sizeof set); > > memcpy(&xset, &set, sizeof set); > > > > and it also helps. > > Now question of the week is - why does pausing the work that aio_thread > does helps things? Come to think of it - your machine (judging by -j 8) > is a lot faster than anything i have in my vicinity, and that might be > why i can not reproduce it... No, I think it's just memory layout. We have some memory corruption, using some stack space moves it about to a non-critical place. > -- > mailto:av1474@comtv.ru
Hi all! I need a way to enumerate all emulated network cards. Is there any command line option or should I walk out source? Regards
On Fri, Oct 9, 2009 at 3:06 AM, Natalia Portillo <claunia@claunia.com> wrote: > Hi all! > > I need a way to enumerate all emulated network cards. > Is there any command line option or should I walk out source? I'm not sure what you mean with "enumerate"; if you want to known which cards are compiled in on a given architecture: qemu -net nic,model=? Luca
That's exactly what I meant, thanks! El 09/10/2009, a las 09:54, Luca Tettamanti escribió: > On Fri, Oct 9, 2009 at 3:06 AM, Natalia Portillo > <claunia@claunia.com> wrote: >> Hi all! >> >> I need a way to enumerate all emulated network cards. >> Is there any command line option or should I walk out source? > > I'm not sure what you mean with "enumerate"; if you want to known > which cards are compiled in on a given architecture: > > qemu -net nic,model=? > > Luca
On Fri, 9 Oct 2009, Michael S. Tsirkin wrote: > On Fri, Oct 09, 2009 at 02:17:13AM +0400, malc wrote: > > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: > > > > > On Fri, Oct 09, 2009 at 01:13:12AM +0400, malc wrote: > > > > On Fri, 9 Oct 2009, malc wrote: > > > > > > > > > On Thu, 8 Oct 2009, Michael S. Tsirkin wrote: [..snip..] > > No, I think it's just memory layout. > We have some memory corruption, using some stack space > moves it about to a non-critical place. A theory that only you can proove.. Would be very interesting to see thec culprit regardless of what it is.
On Thu, 2009-10-08 at 22:37 +0200, Michael S. Tsirkin wrote: > With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > "posix-aio-compat: avoid signal race when spawning a thread" > winxp installation on a raw format file fails > during disk format, with a message "your > disk may be damaged". > > This commit moved signal mask from aio thread to creating thread. > It turns out if we keep the mask in aio thread as well, the problem > disappears. It should not be needed, but since this is harmless, let's > keep it around until someone inclined to debug pthread library internals > can check this issue. > > While we are at it, convert sigprocmask to pthread_sigmask > as per posix. FWIW, I just started hitting a boot hang with qemu.git and --enable-kvm on a Fedora 11 machine with a Fedora 11 guest. I bisected it back to malc's commit, found this thread, applied Michael's patch and confirmed that it fixes the problem for me too. Cheers, Mark.
On 10/20/2009 08:39 PM, Mark McLoughlin wrote: > On Thu, 2009-10-08 at 22:37 +0200, Michael S. Tsirkin wrote: >> With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, >> "posix-aio-compat: avoid signal race when spawning a thread" >> winxp installation on a raw format file fails >> during disk format, with a message "your >> disk may be damaged". >> >> This commit moved signal mask from aio thread to creating thread. >> It turns out if we keep the mask in aio thread as well, the problem >> disappears. It should not be needed, but since this is harmless, let's >> keep it around until someone inclined to debug pthread library internals >> can check this issue. >> >> While we are at it, convert sigprocmask to pthread_sigmask >> as per posix. > > FWIW, I just started hitting a boot hang with qemu.git and --enable-kvm > on a Fedora 11 machine with a Fedora 11 guest. > > I bisected it back to malc's commit, found this thread, applied > Michael's patch and confirmed that it fixes the problem for me too. If anybody can send me the output of compiling the "strange" file with and without the patch, both with "-fdump-tree-all -fdump-rtl-all -O2 --save-temps -g" flags, I could try debugging it in GCC. It will be huge, so bz/gz/lzip it. Paolo
On Tue, Oct 20, 2009 at 08:57:13PM +0200, Paolo Bonzini wrote: > On 10/20/2009 08:39 PM, Mark McLoughlin wrote: >> On Thu, 2009-10-08 at 22:37 +0200, Michael S. Tsirkin wrote: >>> With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, >>> "posix-aio-compat: avoid signal race when spawning a thread" >>> winxp installation on a raw format file fails >>> during disk format, with a message "your >>> disk may be damaged". >>> >>> This commit moved signal mask from aio thread to creating thread. >>> It turns out if we keep the mask in aio thread as well, the problem >>> disappears. It should not be needed, but since this is harmless, let's >>> keep it around until someone inclined to debug pthread library internals >>> can check this issue. >>> >>> While we are at it, convert sigprocmask to pthread_sigmask >>> as per posix. >> >> FWIW, I just started hitting a boot hang with qemu.git and --enable-kvm >> on a Fedora 11 machine with a Fedora 11 guest. >> >> I bisected it back to malc's commit, found this thread, applied >> Michael's patch and confirmed that it fixes the problem for me too. > > If anybody can send me the output of compiling the "strange" file with > and without the patch, both with "-fdump-tree-all -fdump-rtl-all -O2 > --save-temps -g" flags, I could try debugging it in GCC. > > It will be huge, so bz/gz/lzip it. I've uploaded them here: http://www.kernel.org/pub/linux/kernel/people/mst/ you can't see them in mirrors yet but will be able to soon when kernel.org mirroring system catches them. > Paolo
On Wed, Oct 21, 2009 at 05:42:34PM +0200, Michael S. Tsirkin wrote: > On Tue, Oct 20, 2009 at 08:57:13PM +0200, Paolo Bonzini wrote: > > On 10/20/2009 08:39 PM, Mark McLoughlin wrote: > >> On Thu, 2009-10-08 at 22:37 +0200, Michael S. Tsirkin wrote: > >>> With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, > >>> "posix-aio-compat: avoid signal race when spawning a thread" > >>> winxp installation on a raw format file fails > >>> during disk format, with a message "your > >>> disk may be damaged". > >>> > >>> This commit moved signal mask from aio thread to creating thread. > >>> It turns out if we keep the mask in aio thread as well, the problem > >>> disappears. It should not be needed, but since this is harmless, let's > >>> keep it around until someone inclined to debug pthread library internals > >>> can check this issue. > >>> > >>> While we are at it, convert sigprocmask to pthread_sigmask > >>> as per posix. > >> > >> FWIW, I just started hitting a boot hang with qemu.git and --enable-kvm > >> on a Fedora 11 machine with a Fedora 11 guest. > >> > >> I bisected it back to malc's commit, found this thread, applied > >> Michael's patch and confirmed that it fixes the problem for me too. > > > > If anybody can send me the output of compiling the "strange" file with > > and without the patch, both with "-fdump-tree-all -fdump-rtl-all -O2 > > --save-temps -g" flags, I could try debugging it in GCC. > > > > It will be huge, so bz/gz/lzip it. > > I've uploaded them here: > http://www.kernel.org/pub/linux/kernel/people/mst/ > you can't see them in mirrors yet but will be able to soon when > kernel.org mirroring system catches them. ok, it's there now: http://www.kernel.org/pub/linux/kernel/people/mst/posix-aio-corruption.tbz2 once you open this, you will see subdirectories "hang" and "nohang". the source file is also there, so that you can compare. > > > Paolo
> I've uploaded them here: > http://www.kernel.org/pub/linux/kernel/people/mst/ > you can't see them in mirrors yet but will be able to soon when > kernel.org mirroring system catches them. There is no difference in optimizations except that here: for (i = 0; i < aiocb->aio_niov && count; ++i) { one of the two versions actually does "count && i < aiocb->aio_niov" due to hashing vagaries. This is irrelevant anyway. Same inlining, same loop optimization decisions, same everything else. So a GCC bug can be ruled out, IMHO. The only difference, as someone already suspected, is the padding---the sigset is placed between the top of the frame and the other variables, which may hide an overrun. This is quite amazing for a function that has no arrays, but still is the only evidence. I suggest trying to make the sigset_t static, since that generates exactly the same code as the "nohang" case, and exactly the same stack layout as the "hang" case. The next obvious step would be placing a watchpoint somewhere. Cheers, Paolo
On Wed, Oct 21, 2009 at 07:28:54PM +0200, Paolo Bonzini wrote: >> I've uploaded them here: >> http://www.kernel.org/pub/linux/kernel/people/mst/ >> you can't see them in mirrors yet but will be able to soon when >> kernel.org mirroring system catches them. > > There is no difference in optimizations except that here: > > for (i = 0; i < aiocb->aio_niov && count; ++i) { > > one of the two versions actually does "count && i < aiocb->aio_niov" due > to hashing vagaries. This is irrelevant anyway. Same inlining, same > loop optimization decisions, same everything else. So a GCC bug can be > ruled out, IMHO. > > The only difference, as someone already suspected, is the padding---the > sigset is placed between the top of the frame and the other variables, > which may hide an overrun. This is quite amazing for a function that > has no arrays, but still is the only evidence. > > I suggest trying to make the sigset_t static, since that generates > exactly the same code as the "nohang" case, and exactly the same stack > layout as the "hang" case. The next obvious step would be placing a > watchpoint somewhere. Yes, but where? > Cheers, > > Paolo
>> I suggest trying to make the sigset_t static, since that generates >> exactly the same code as the "nohang" case, and exactly the same stack >> layout as the "hang" case. (In case this wasn't clear: the sigfillset of a static sigset_t should hang, proving that it's stack layout that comes to the rescue). >> The next obvious step would be placing a >> watchpoint somewhere. > > Yes, but where? At every word of the sigset (using gdb commands to disable/enable the watchpoints around the sigfillset, you avoid spurious triggers). One of those words will be overwritten if an overrun would have smashed the stack. If it does not fire, s/sigfillset/sigemptyset/ in case it was writing 0xffffffff. If it still does not fire, dunno. :-( Paolo
On Wed, Oct 21, 2009 at 07:44:14PM +0200, Paolo Bonzini wrote: > >>> I suggest trying to make the sigset_t static, since that generates >>> exactly the same code as the "nohang" case, and exactly the same stack >>> layout as the "hang" case. > > (In case this wasn't clear: the sigfillset of a static sigset_t should > hang, proving that it's stack layout that comes to the rescue). > >>> The next obvious step would be placing a >>> watchpoint somewhere. >> >> Yes, but where? > > At every word of the sigset (using gdb commands to disable/enable the > watchpoints around the sigfillset, you avoid spurious triggers). Not sure how do you mean. When would I enable the watchpoint? > One of > those words will be overwritten if an overrun would have smashed the > stack. If it does not fire, s/sigfillset/sigemptyset/ in case it was > writing 0xffffffff. If it still does not fire, dunno. :-( > > Paolo
>> At every word of the sigset (using gdb commands to disable/enable the >> watchpoints around the sigfillset, you avoid spurious triggers). > > Not sure how do you mean. When would I enable the watchpoint? 301 static void *aio_thread(void *unused) 302 { 303 pid_t pid; 304 static sigset_t set; 305 306 /* block all signals */ ... 312 if (sigfillset(&set)) die("sigfillset"); .... 315 pid = getpid(); 316 317 while (1) { b 312 // sets bp 1 p ((long *)&set // prints $1 watch $1[0] // sets bp 2..9 watch $1[1] watch $1[2] watch $1[3] watch $1[4] watch $1[5] watch $1[6] watch $1[7] // only 1/4 of the sigset! // stop earlier if it starts // assigning non-hardware watchpoints // try again with 8..15 if not enough // up to $1[31] commands 1 // don't break on next sigfillset dis 2 dis 3 dis 4 dis 5 dis 6 dis 7 dis 8 dis 9 c // and continue end b 315 // sets bp 10 commands 10 // re-enable after sigfillset ena 2 ena 3 ena 4 ena 5 ena 6 ena 7 ena 8 ena 9 c // run body of aio_thread end c // run into aio_thread Paolo
diff --git a/posix-aio-compat.c b/posix-aio-compat.c index 400d898..4abbe3a 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -301,6 +301,16 @@ static size_t handle_aiocb_rw(struct qemu_paiocb *aiocb) static void *aio_thread(void *unused) { pid_t pid; + sigset_t set; + + /* block all signals */ + /* Should not be necessary as we should inherit mask + * from creating thread. However, without this, + * on FC11, using raw file, WinXP installation fails during disk format + * saying disk was damaged. pthread library bug? + * */ + if (sigfillset(&set)) die("sigfillset"); + if (pthread_sigmask(SIG_BLOCK, &set, NULL)) die("pthread_sigmask"); pid = getpid(); @@ -371,11 +381,11 @@ static void spawn_thread(void) /* block all signals */ if (sigfillset(&set)) die("sigfillset"); - if (sigprocmask(SIG_SETMASK, &set, &oldset)) die("sigprocmask"); + if (pthread_sigmask(SIG_SETMASK, &set, &oldset)) die("pthread_sigmask"); thread_create(&thread_id, &attr, aio_thread, NULL); - if (sigprocmask(SIG_SETMASK, &oldset, NULL)) die("sigprocmask restore"); + if (pthread_sigmask(SIG_SETMASK, &oldset, NULL)) die("pthread_sigmask restore"); } static void qemu_paio_submit(struct qemu_paiocb *aiocb)
With commit ee3993069ff55fa6f1c64daf1e09963e340db8e4, "posix-aio-compat: avoid signal race when spawning a thread" winxp installation on a raw format file fails during disk format, with a message "your disk may be damaged". This commit moved signal mask from aio thread to creating thread. It turns out if we keep the mask in aio thread as well, the problem disappears. It should not be needed, but since this is harmless, let's keep it around until someone inclined to debug pthread library internals can check this issue. While we are at it, convert sigprocmask to pthread_sigmask as per posix. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> For the benefit of whoever tries to debug this any deeper here's documentation on how to reproduce the issue to be saved in git logs (need to tweak paths obviously): The test was done on FC11, 32 bit userspace, 64 bit kernel 2.6.31. ./configure --prefix=/scm/qemu-kvm-debug --target-list=x86_64-softmmu \ --enable-kvm make -j 8 && make install rm /scm/images/winxp-bisect.raw /scm/qemu-kvm-debug/bin/qemu-img create -f raw \ /scm/images/winxp-bisect.raw 2G /scm/qemu-kvm-debug/bin/qemu-system-x86_64 -enable-kvm -cdrom \ /home/mst/en_windows_xp_professional_with_service_pack_3_x86_cd_x14-80428.iso \ /scm/images/winxp-bisect.raw Select format to NTFS. --- OK, I'm out of time with this issue, let's apply a work-around until someone interested comes around? posix-aio-compat.c | 14 ++++++++++++--