Message ID | 1305729469-12235-1-git-send-email-stefan.bader@canonical.com |
---|---|
State | New |
Headers | show |
On 05/18/2011 07:37 AM, Stefan Bader wrote: > We got it into Lucid/Lucid-ec2 but people also use Maverick (all kinds) > kernels in their Xen environment. > > SRU Justification: > > Impact: With the current kernels the kernel oops described in comment #3 > is experienced as a result of enabling interrupts on the pv spinlock event > channel. > > Fix: The following patch is taken from upstream and is included in 2.6.37. > It has been reported to successfully prevent the oops. > > Testcase: Migration of a guest (using suspend) > > > From cf2e26cf8402af6f65bd89611682497db278f309 Mon Sep 17 00:00:00 2001 > From: Ian Campbell <ian.campbell@citrix.com> > Date: Mon, 1 Nov 2010 16:30:09 +0000 > Subject: [PATCH] xen: events: do not unmask event channels on resume > > The IRQ core code will take care of disabling and reenabling > interrupts over suspend resume automatically, therefore we do not need > to do this in the Xen event channel code. > > The only exception is those event channels marked IRQF_NO_SUSPEND > which the IRQ core ignores. We must unmask these ourselves, taking > care to obey the current IRQ_DISABLED status. Failure check for > IRQ_DISABLED leads to enabling polled only event channels, such as > that associated with the pv spinlocks, which must never be enabled: > > [ 21.970432] ------------[ cut here ]------------ > [ 21.970432] kernel BUG at arch/x86/xen/spinlock.c:343! > [ 21.970432] invalid opcode: 0000 [#1] SMP > [ 21.970432] last sysfs file: /sys/devices/virtual/net/lo/operstate > [ 21.970432] Modules linked in: > [ 21.970432] > [ 21.970432] Pid: 0, comm: swapper Not tainted (2.6.32.24-x86_32p-xen-01034-g787c727 #34) > [ 21.970432] EIP: 0061:[<c102e209>] EFLAGS: 00010046 CPU: 3 > [ 21.970432] EIP is at dummy_handler+0x3/0x7 > [ 21.970432] EAX: 0000021c EBX: dfc16880 ECX: 0000001a EDX: 00000000 > [ 21.970432] ESI: dfc02c00 EDI: 00000001 EBP: dfc47e10 ESP: dfc47e10 > [ 21.970432] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > [ 21.970432] Process swapper (pid: 0, ti=dfc46000 task=dfc39440 task.ti=dfc46000) > [ 21.970432] Stack: > [ 21.970432] dfc47e30 c10a39f0 0000021c 00000000 00000000 dfc16880 0000021c 00000001 > [ 21.970432] <0> dfc47e40 c10a4f08 0000021c 00000000 dfc47e78 c12240a7 c1839284 c1839284 > [ 21.970432] <0> 00000200 00000000 00000000 f5720000 c1f3d028 c1f3d02c 00000180 dfc47e90 > [ 21.970432] Call Trace: > [ 21.970432] [<c10a39f0>] ? handle_IRQ_event+0x5f/0x122 > [ 21.970432] [<c10a4f08>] ? handle_percpu_irq+0x2f/0x55 > [ 21.970432] [<c12240a7>] ? __xen_evtchn_do_upcall+0xdb/0x15f > [ 21.970432] [<c122481e>] ? xen_evtchn_do_upcall+0x20/0x30 > [ 21.970432] [<c1030d47>] ? xen_do_upcall+0x7/0xc > [ 21.970432] [<c102007b>] ? apic_reg_read+0xd3/0x22d > [ 21.970432] [<c1002227>] ? hypercall_page+0x227/0x1005 > [ 21.970432] [<c102d30b>] ? xen_force_evtchn_callback+0xf/0x14 > [ 21.970432] [<c102da7c>] ? check_events+0x8/0xc > [ 21.970432] [<c102da3b>] ? xen_irq_enable_direct_end+0x0/0x1 > [ 21.970432] [<c105e485>] ? finish_task_switch+0x62/0xba > [ 21.970432] [<c14e3f84>] ? schedule+0x808/0x89d > [ 21.970432] [<c1084dc5>] ? hrtimer_start_expires+0x1a/0x22 > [ 21.970432] [<c1085154>] ? tick_nohz_restart_sched_tick+0x15a/0x162 > [ 21.970432] [<c102f43a>] ? cpu_idle+0x6d/0x6f > [ 21.970432] [<c14db29e>] ? cpu_bringup_and_idle+0xd/0xf > [ 21.970432] Code: 5d 0f 95 c0 0f b6 c0 c3 55 66 83 78 02 00 89 e5 5d 0f 95 \ > c0 0f b6 c0 c3 55 b2 01 86 10 31 c0 84 d2 89 e5 0f 94 c0 5d c3 55 89 e5 <0f> 0b \ > eb fe 55 80 3d 4c ce 84 c1 00 89 e5 57 56 89 c6 53 74 15 > [ 21.970432] EIP: [<c102e209>] dummy_handler+0x3/0x7 SS:ESP 0069:dfc47e10 > [ 21.970432] ---[ end trace c0b71f7e12cf3011 ]--- > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > BugLink: http://bugs.launchpad.net/bugs/681083 > > (cherry-picked from commit 6903591f314b8947d0e362bda7715e90eb9df75e upstream) > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: John Johansen <john.johansen@canonical.com> > --- > drivers/xen/events.c | 25 ++++++++++++++++++------- > 1 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/drivers/xen/events.c b/drivers/xen/events.c > index 0769108..9c66deb 100644 > --- a/drivers/xen/events.c > +++ b/drivers/xen/events.c > @@ -819,9 +819,6 @@ static void restore_cpu_virqs(unsigned int cpu) > evtchn_to_irq[evtchn] = irq; > irq_info[irq] = mk_virq_info(evtchn, virq); > bind_evtchn_to_cpu(evtchn, cpu); > - > - /* Ready for use. */ > - unmask_evtchn(evtchn); > } > } > > @@ -847,10 +844,6 @@ static void restore_cpu_ipis(unsigned int cpu) > evtchn_to_irq[evtchn] = irq; > irq_info[irq] = mk_ipi_info(evtchn, ipi); > bind_evtchn_to_cpu(evtchn, cpu); > - > - /* Ready for use. */ > - unmask_evtchn(evtchn); > - > } > } > > @@ -903,6 +896,7 @@ void xen_poll_irq(int irq) > void xen_irq_resume(void) > { > unsigned int cpu, irq, evtchn; > + struct irq_desc *desc; > > init_evtchn_cpu_bindings(); > > @@ -921,6 +915,23 @@ void xen_irq_resume(void) > restore_cpu_virqs(cpu); > restore_cpu_ipis(cpu); > } > + > + /* > + * Unmask any IRQF_NO_SUSPEND IRQs which are enabled. These > + * are not handled by the IRQ core. > + */ > + for_each_irq_desc(irq, desc) { > + if (!desc->action || !(desc->action->flags & IRQF_NO_SUSPEND)) > + continue; > + if (desc->status & IRQ_DISABLED) > + continue; > + > + evtchn = evtchn_from_irq(irq); > + if (evtchn == -1) > + continue; > + > + unmask_evtchn(evtchn); > + } > } > > static struct irq_chip xen_dynamic_chip __read_mostly = {
applied. I'm guessing this ought to go to stable ? Andi Kleen is still maintaining that version IIRC.
On 05/18/2011 05:38 PM, Tim Gardner wrote: > applied. I'm guessing this ought to go to stable ? Andi Kleen is still > maintaining that version IIRC. It does not seem to fix all cases of the migration problem (at least not in Lucid). Though some reported success there. So I suspect there might be another related issue. I would want to wait with stable until sure about the exact set.
On 05/19/2011 02:24 AM, Stefan Bader wrote: > On 05/18/2011 05:38 PM, Tim Gardner wrote: >> applied. I'm guessing this ought to go to stable ? Andi Kleen is still >> maintaining that version IIRC. > > It does not seem to fix all cases of the migration problem (at least not in > Lucid). Though some reported success there. So I suspect there might be another > related issue. I would want to wait with stable until sure about the exact set. > If its not good enough for upstream stable, then why would we apply it?
On 25.05.2011 16:27, Tim Gardner wrote: > On 05/19/2011 02:24 AM, Stefan Bader wrote: >> On 05/18/2011 05:38 PM, Tim Gardner wrote: >>> applied. I'm guessing this ought to go to stable ? Andi Kleen is still >>> maintaining that version IIRC. >> >> It does not seem to fix all cases of the migration problem (at least not in >> Lucid). Though some reported success there. So I suspect there might be another >> related issue. I would want to wait with stable until sure about the exact set. >> > > If its not good enough for upstream stable, then why would we apply it? > It is good enough. What I want to avoid is to miss something that needs to go as well. Though since .35 seems currently to break Xen (with a different patch) I probably am just too cautious. And its hard to tell with many reporters some claim success, some partial, some none (at least for the lucid version) to say for sure. -Stefan
On 25.05.2011 16:34, Stefan Bader wrote: > On 25.05.2011 16:27, Tim Gardner wrote: >> On 05/19/2011 02:24 AM, Stefan Bader wrote: >>> On 05/18/2011 05:38 PM, Tim Gardner wrote: >>>> applied. I'm guessing this ought to go to stable ? Andi Kleen is still >>>> maintaining that version IIRC. >>> >>> It does not seem to fix all cases of the migration problem (at least not in >>> Lucid). Though some reported success there. So I suspect there might be another >>> related issue. I would want to wait with stable until sure about the exact set. >>> >> >> If its not good enough for upstream stable, then why would we apply it? >> > It is good enough. What I want to avoid is to miss something that needs to go as > well. Though since .35 seems currently to break Xen (with a different patch) I > probably am just too cautious. > And its hard to tell with many reporters some claim success, some partial, some > none (at least for the lucid version) to say for sure. > > -Stefan > Ok I started discussion on xen-devel about this at least.
On 05/25/2011 08:49 AM, Stefan Bader wrote: > On 25.05.2011 16:34, Stefan Bader wrote: >> On 25.05.2011 16:27, Tim Gardner wrote: >>> On 05/19/2011 02:24 AM, Stefan Bader wrote: >>>> On 05/18/2011 05:38 PM, Tim Gardner wrote: >>>>> applied. I'm guessing this ought to go to stable ? Andi Kleen is still >>>>> maintaining that version IIRC. >>>> >>>> It does not seem to fix all cases of the migration problem (at least >>>> not in >>>> Lucid). Though some reported success there. So I suspect there might >>>> be another >>>> related issue. I would want to wait with stable until sure about the >>>> exact set. >>>> >>> >>> If its not good enough for upstream stable, then why would we apply it? >>> >> It is good enough. What I want to avoid is to miss something that >> needs to go as >> well. Though since .35 seems currently to break Xen (with a different >> patch) I >> probably am just too cautious. >> And its hard to tell with many reporters some claim success, some >> partial, some >> none (at least for the lucid version) to say for sure. >> >> -Stefan >> > Ok I started discussion on xen-devel about this at least. > Any forward motion on this ?
On 06.06.2011 17:09, Tim Gardner wrote: > On 05/25/2011 08:49 AM, Stefan Bader wrote: >> On 25.05.2011 16:34, Stefan Bader wrote: >>> On 25.05.2011 16:27, Tim Gardner wrote: >>>> On 05/19/2011 02:24 AM, Stefan Bader wrote: >>>>> On 05/18/2011 05:38 PM, Tim Gardner wrote: >>>>>> applied. I'm guessing this ought to go to stable ? Andi Kleen is still >>>>>> maintaining that version IIRC. >>>>> >>>>> It does not seem to fix all cases of the migration problem (at least >>>>> not in >>>>> Lucid). Though some reported success there. So I suspect there might >>>>> be another >>>>> related issue. I would want to wait with stable until sure about the >>>>> exact set. >>>>> >>>> >>>> If its not good enough for upstream stable, then why would we apply it? >>>> >>> It is good enough. What I want to avoid is to miss something that >>> needs to go as >>> well. Though since .35 seems currently to break Xen (with a different >>> patch) I >>> probably am just too cautious. >>> And its hard to tell with many reporters some claim success, some >>> partial, some >>> none (at least for the lucid version) to say for sure. >>> >>> -Stefan >>> >> Ok I started discussion on xen-devel about this at least. >> > > Any forward motion on this ? > Nope, if there was a reply it was at least not cc back to me directly. And I have not scraped any archives.
diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 0769108..9c66deb 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -819,9 +819,6 @@ static void restore_cpu_virqs(unsigned int cpu) evtchn_to_irq[evtchn] = irq; irq_info[irq] = mk_virq_info(evtchn, virq); bind_evtchn_to_cpu(evtchn, cpu); - - /* Ready for use. */ - unmask_evtchn(evtchn); } } @@ -847,10 +844,6 @@ static void restore_cpu_ipis(unsigned int cpu) evtchn_to_irq[evtchn] = irq; irq_info[irq] = mk_ipi_info(evtchn, ipi); bind_evtchn_to_cpu(evtchn, cpu); - - /* Ready for use. */ - unmask_evtchn(evtchn); - } } @@ -903,6 +896,7 @@ void xen_poll_irq(int irq) void xen_irq_resume(void) { unsigned int cpu, irq, evtchn; + struct irq_desc *desc; init_evtchn_cpu_bindings(); @@ -921,6 +915,23 @@ void xen_irq_resume(void) restore_cpu_virqs(cpu); restore_cpu_ipis(cpu); } + + /* + * Unmask any IRQF_NO_SUSPEND IRQs which are enabled. These + * are not handled by the IRQ core. + */ + for_each_irq_desc(irq, desc) { + if (!desc->action || !(desc->action->flags & IRQF_NO_SUSPEND)) + continue; + if (desc->status & IRQ_DISABLED) + continue; + + evtchn = evtchn_from_irq(irq); + if (evtchn == -1) + continue; + + unmask_evtchn(evtchn); + } } static struct irq_chip xen_dynamic_chip __read_mostly = {