Message ID | 1276543644-32689-1-git-send-email-glommer@redhat.com |
---|---|
State | New |
Headers | show |
On 06/14/2010 02:27 PM, Glauber Costa wrote: > This patch fixes a bug that happens with kvm, irqchip-in-kernel, > while adding a netdev. Despite the situations of reproduction being > specific to kvm, I believe this fix is pretty generic, and fits here. > Specially if we ever want to have our own irqchip in kernel too. > > The problem happens after the fork system call, and although it is not > 100 % reproduceable, happens pretty often. After fork, the memory where > the apic is mapped is present in both processes. It ends up confusing > the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > irqs being delivered at all from that point on. > > Making sure the vcpus are stopped before forking makes the problem go > away. Besides, this is a pretty unfrequent operation, which already hangs > the io-thread for a while. So it should not hurt performance. > > Signed-off-by: Glauber Costa<glommer@redhat.com> > This doesn't make very much sense to me but smells like a kernel bug to me. Even if it isn't, I can't rationalize why stopping the vm like this is enough to fix such a problem. Is the problem that the KVM VCPU threads get duplicated while potentially running or something like that? Regards, Anthony Liguori > --- > net/tap.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/net/tap.c b/net/tap.c > index 0147dab..f34dd9c 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -330,6 +330,9 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) > sigaddset(&mask, SIGCHLD); > sigprocmask(SIG_BLOCK,&mask,&oldmask); > > + /* make sure no cpus are running, so the apic does not > + * get confused */ > + vm_stop(0); > /* try to launch network script */ > pid = fork(); > if (pid == 0) { > @@ -350,6 +353,7 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) > execv(setup_script, args); > _exit(1); > } else if (pid> 0) { > + vm_start(); > while (waitpid(pid,&status, 0) != pid) { > /* loop */ > } >
On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote: > On 06/14/2010 02:27 PM, Glauber Costa wrote: > >This patch fixes a bug that happens with kvm, irqchip-in-kernel, > >while adding a netdev. Despite the situations of reproduction being > >specific to kvm, I believe this fix is pretty generic, and fits here. > >Specially if we ever want to have our own irqchip in kernel too. > > > >The problem happens after the fork system call, and although it is not > >100 % reproduceable, happens pretty often. After fork, the memory where > >the apic is mapped is present in both processes. It ends up confusing > >the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > >irqs being delivered at all from that point on. > > > >Making sure the vcpus are stopped before forking makes the problem go > >away. Besides, this is a pretty unfrequent operation, which already hangs > >the io-thread for a while. So it should not hurt performance. > > > >Signed-off-by: Glauber Costa<glommer@redhat.com> > > This doesn't make very much sense to me but smells like a kernel bug to me. My interpretation is that by doing that, we make sure no in-flight requests are happening. Actually, a sleep(x), with x sufficiently big is enough to make this problem go away, but that is too hacky. I do agree that this is most likely a kernel bug. But as with any other kernel bugs, I believe this is a easy workaround to have things working even in older kernels until we fix it. > > Even if it isn't, I can't rationalize why stopping the vm like this > is enough to fix such a problem. Is the problem that the KVM VCPU > threads get duplicated while potentially running or something like > that? I doubt fork is duplicating the vcpu threads. More than that, this bug does not happen with userspace irqchip. So I believe that either irq request or the ack itself is reaching the wrong process, forever stalling the apic.
On 06/14/2010 02:42 PM, Glauber Costa wrote: > On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote: > >> On 06/14/2010 02:27 PM, Glauber Costa wrote: >> >>> This patch fixes a bug that happens with kvm, irqchip-in-kernel, >>> while adding a netdev. Despite the situations of reproduction being >>> specific to kvm, I believe this fix is pretty generic, and fits here. >>> Specially if we ever want to have our own irqchip in kernel too. >>> >>> The problem happens after the fork system call, and although it is not >>> 100 % reproduceable, happens pretty often. After fork, the memory where >>> the apic is mapped is present in both processes. It ends up confusing >>> the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no >>> irqs being delivered at all from that point on. >>> >>> Making sure the vcpus are stopped before forking makes the problem go >>> away. Besides, this is a pretty unfrequent operation, which already hangs >>> the io-thread for a while. So it should not hurt performance. >>> >>> Signed-off-by: Glauber Costa<glommer@redhat.com> >>> >> This doesn't make very much sense to me but smells like a kernel bug to me. >> > My interpretation is that by doing that, we make sure no in-flight > requests are happening. Actually, a sleep(x), with x sufficiently big > is enough to make this problem go away, but that is too hacky. > vm_stop() is probably just acting a glorified sleep() since it has to wait for each thread to stop. > I do agree that this is most likely a kernel bug. But as with any other > kernel bugs, I believe this is a easy workaround to have things working > even in older kernels until we fix it. > If we don't know what the bug is, then we do not know whether this is a work around. Rather, this change happens to make the bug more difficult to reproduce with your test case. >> Even if it isn't, I can't rationalize why stopping the vm like this >> is enough to fix such a problem. Is the problem that the KVM VCPU >> threads get duplicated while potentially running or something like >> that? >> > I doubt fork is duplicating the vcpu threads. More than that, this > bug does not happen with userspace irqchip. > So I believe that either irq request or the ack itself is reaching the > wrong process, forever stalling the apic. > That sounds more like a signal delivery issue. It's not obvious to me that we're doing the wrong thing with signal mask though. If it's a signal mask related issue, then vm_stop isn't a proper fix as there would be still be a race. Regards, Anthony Liguori
On Mon, Jun 14, 2010 at 02:58:47PM -0500, Anthony Liguori wrote: > On 06/14/2010 02:42 PM, Glauber Costa wrote: > >On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote: > >>On 06/14/2010 02:27 PM, Glauber Costa wrote: > >>>This patch fixes a bug that happens with kvm, irqchip-in-kernel, > >>>while adding a netdev. Despite the situations of reproduction being > >>>specific to kvm, I believe this fix is pretty generic, and fits here. > >>>Specially if we ever want to have our own irqchip in kernel too. > >>> > >>>The problem happens after the fork system call, and although it is not > >>>100 % reproduceable, happens pretty often. After fork, the memory where > >>>the apic is mapped is present in both processes. It ends up confusing > >>>the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > >>>irqs being delivered at all from that point on. > >>> > >>>Making sure the vcpus are stopped before forking makes the problem go > >>>away. Besides, this is a pretty unfrequent operation, which already hangs > >>>the io-thread for a while. So it should not hurt performance. > >>> > >>>Signed-off-by: Glauber Costa<glommer@redhat.com> > >>This doesn't make very much sense to me but smells like a kernel bug to me. > >My interpretation is that by doing that, we make sure no in-flight > >requests are happening. Actually, a sleep(x), with x sufficiently big > >is enough to make this problem go away, but that is too hacky. > > vm_stop() is probably just acting a glorified sleep() since it has > to wait for each thread to stop. > > >I do agree that this is most likely a kernel bug. But as with any other > >kernel bugs, I believe this is a easy workaround to have things working > >even in older kernels until we fix it. > > If we don't know what the bug is, then we do not know whether this > is a work around. Rather, this change happens to make the bug more > difficult to reproduce with your test case. > > >>Even if it isn't, I can't rationalize why stopping the vm like this > >>is enough to fix such a problem. Is the problem that the KVM VCPU > >>threads get duplicated while potentially running or something like > >>that? > >I doubt fork is duplicating the vcpu threads. More than that, this > >bug does not happen with userspace irqchip. > >So I believe that either irq request or the ack itself is reaching the > >wrong process, forever stalling the apic. > > That sounds more like a signal delivery issue. It's not obvious to > me that we're doing the wrong thing with signal mask though. > > If it's a signal mask related issue, then vm_stop isn't a proper fix > as there would be still be a race. I do can investigate it further, but I doubt it is signal-delivery related. I spent the first days believing it was, but now, I believe it is much more likely to be apic-related. We don't need to wait for the child to exit for this bug to happen, so SIGCHLD is never raised. And with in-kernel irqchip, we don't deliver signals during normal vcpu execution.
On Mon, Jun 14, 2010 at 04:42:37PM -0300, Glauber Costa wrote: > On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote: > > On 06/14/2010 02:27 PM, Glauber Costa wrote: > > >This patch fixes a bug that happens with kvm, irqchip-in-kernel, > > >while adding a netdev. Despite the situations of reproduction being > > >specific to kvm, I believe this fix is pretty generic, and fits here. > > >Specially if we ever want to have our own irqchip in kernel too. > > > > > >The problem happens after the fork system call, and although it is not > > >100 % reproduceable, happens pretty often. After fork, the memory where > > >the apic is mapped is present in both processes. It ends up confusing > > >the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > > >irqs being delivered at all from that point on. > > > > > >Making sure the vcpus are stopped before forking makes the problem go > > >away. Besides, this is a pretty unfrequent operation, which already hangs > > >the io-thread for a while. So it should not hurt performance. > > > > > >Signed-off-by: Glauber Costa<glommer@redhat.com> > > > > This doesn't make very much sense to me but smells like a kernel bug to me. > My interpretation is that by doing that, we make sure no in-flight > requests are happening. Actually, a sleep(x), with x sufficiently big > is enough to make this problem go away, but that is too hacky. > > I do agree that this is most likely a kernel bug. But as with any other > kernel bugs, I believe this is a easy workaround to have things working > even in older kernels until we fix it. > > > > > Even if it isn't, I can't rationalize why stopping the vm like this > > is enough to fix such a problem. Is the problem that the KVM VCPU > > threads get duplicated while potentially running or something like > > that? > > I doubt fork is duplicating the vcpu threads. More than that, this > bug does not happen with userspace irqchip. Hmm, could this problem be hitting other places where we fork() besides the netdev helper script ? I have seen a intermittent bug[1] with migration, where using exec: migration will freeze the entire guest and setting -no-kvm-irqchip fixes the problem Regards, Daniel [1] https://bugzilla.redhat.com/show_bug.cgi?id=585195
On 06/14/2010 10:33 PM, Anthony Liguori wrote: > On 06/14/2010 02:27 PM, Glauber Costa wrote: >> This patch fixes a bug that happens with kvm, irqchip-in-kernel, >> while adding a netdev. Despite the situations of reproduction being >> specific to kvm, I believe this fix is pretty generic, and fits here. >> Specially if we ever want to have our own irqchip in kernel too. >> >> The problem happens after the fork system call, and although it is not >> 100 % reproduceable, happens pretty often. After fork, the memory where >> the apic is mapped is present in both processes. It ends up confusing >> the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no >> irqs being delivered at all from that point on. >> >> Making sure the vcpus are stopped before forking makes the problem go >> away. Besides, this is a pretty unfrequent operation, which already >> hangs >> the io-thread for a while. So it should not hurt performance. > > This doesn't make very much sense to me but smells like a kernel bug > to me. It is, and the fix would be to create the APIC memory slot as sharable across forks (should be easy to fix in the kernel). > Even if it isn't, I can't rationalize why stopping the vm like this is > enough to fix such a problem. Is the problem that the KVM VCPU > threads get duplicated while potentially running or something like that? I think it's COW triggering a copy on one vcpu while the other reads from the old page.
On 06/14/2010 10:42 PM, Glauber Costa wrote: > I do agree that this is most likely a kernel bug. But as with any other > kernel bugs, I believe this is a easy workaround to have things working > even in older kernels until we fix it. > That's what -stable is for. We fix it upstream and backport it. No need to complicate userspace for something which isn't its fault.
On 06/15/2010 09:14 AM, Daniel P. Berrange wrote: > > Hmm, could this problem be hitting other places where we fork() besides > the netdev helper script ? I have seen a intermittent bug[1] with migration, > where using exec: migration will freeze the entire guest and setting > -no-kvm-irqchip fixes the problem > > Regards, > Daniel > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=585195 > Looks very similar, I never thought of the APIC pages in that connection. Kudos to Glauber for tracking it down.
On Mon, Jun 14, 2010 at 02:58:47PM -0500, Anthony Liguori wrote: > On 06/14/2010 02:42 PM, Glauber Costa wrote: > >On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote: > >>On 06/14/2010 02:27 PM, Glauber Costa wrote: > >>>This patch fixes a bug that happens with kvm, irqchip-in-kernel, > >>>while adding a netdev. Despite the situations of reproduction being > >>>specific to kvm, I believe this fix is pretty generic, and fits here. > >>>Specially if we ever want to have our own irqchip in kernel too. > >>> > >>>The problem happens after the fork system call, and although it is not > >>>100 % reproduceable, happens pretty often. After fork, the memory where > >>>the apic is mapped is present in both processes. It ends up confusing > >>>the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > >>>irqs being delivered at all from that point on. > >>> > >>>Making sure the vcpus are stopped before forking makes the problem go > >>>away. Besides, this is a pretty unfrequent operation, which already hangs > >>>the io-thread for a while. So it should not hurt performance. > >>> > >>>Signed-off-by: Glauber Costa<glommer@redhat.com> > >>This doesn't make very much sense to me but smells like a kernel bug to me. > >My interpretation is that by doing that, we make sure no in-flight > >requests are happening. Actually, a sleep(x), with x sufficiently big > >is enough to make this problem go away, but that is too hacky. > > vm_stop() is probably just acting a glorified sleep() since it has > to wait for each thread to stop. No, it is not. It also makes sure no vcpus are running, and thus, not generating new requests (or waiting for any replies, for that matter). (Note: I am not advocating the inclusion of this, just trying to build my own awareness)
On Tue, Jun 15, 2010 at 10:33:21AM +0300, Avi Kivity wrote: > On 06/14/2010 10:33 PM, Anthony Liguori wrote: > >On 06/14/2010 02:27 PM, Glauber Costa wrote: > >>This patch fixes a bug that happens with kvm, irqchip-in-kernel, > >>while adding a netdev. Despite the situations of reproduction being > >>specific to kvm, I believe this fix is pretty generic, and fits here. > >>Specially if we ever want to have our own irqchip in kernel too. > >> > >>The problem happens after the fork system call, and although it is not > >>100 % reproduceable, happens pretty often. After fork, the memory where > >>the apic is mapped is present in both processes. It ends up confusing > >>the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > >>irqs being delivered at all from that point on. > >> > >>Making sure the vcpus are stopped before forking makes the problem go > >>away. Besides, this is a pretty unfrequent operation, which > >>already hangs > >>the io-thread for a while. So it should not hurt performance. > > > >This doesn't make very much sense to me but smells like a kernel > >bug to me. > > It is, and the fix would be to create the APIC memory slot as > sharable across forks (should be easy to fix in the kernel). Kernel pages are already shared across fork, no?
On 06/16/2010 07:58 PM, Glauber Costa wrote: > >> It is, and the fix would be to create the APIC memory slot as >> sharable across forks (should be easy to fix in the kernel). >> > Kernel pages are already shared across fork, no? > What are kernel pages in this context? The APIC access page is an ordinary user page.
diff --git a/net/tap.c b/net/tap.c index 0147dab..f34dd9c 100644 --- a/net/tap.c +++ b/net/tap.c @@ -330,6 +330,9 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) sigaddset(&mask, SIGCHLD); sigprocmask(SIG_BLOCK, &mask, &oldmask); + /* make sure no cpus are running, so the apic does not + * get confused */ + vm_stop(0); /* try to launch network script */ pid = fork(); if (pid == 0) { @@ -350,6 +353,7 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) execv(setup_script, args); _exit(1); } else if (pid > 0) { + vm_start(); while (waitpid(pid, &status, 0) != pid) { /* loop */ }
This patch fixes a bug that happens with kvm, irqchip-in-kernel, while adding a netdev. Despite the situations of reproduction being specific to kvm, I believe this fix is pretty generic, and fits here. Specially if we ever want to have our own irqchip in kernel too. The problem happens after the fork system call, and although it is not 100 % reproduceable, happens pretty often. After fork, the memory where the apic is mapped is present in both processes. It ends up confusing the vcpus somewhere in the irq <-> ack path, and qemu hangs, with no irqs being delivered at all from that point on. Making sure the vcpus are stopped before forking makes the problem go away. Besides, this is a pretty unfrequent operation, which already hangs the io-thread for a while. So it should not hurt performance. Signed-off-by: Glauber Costa <glommer@redhat.com> --- net/tap.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)