From patchwork Tue Feb 5 14:24:04 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Bader X-Patchwork-Id: 218262 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id DA8532C029B for ; Wed, 6 Feb 2013 01:24:17 +1100 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1U2jRQ-00061O-Aj; Tue, 05 Feb 2013 14:24:08 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1U2jRO-00061J-14 for kernel-team@lists.ubuntu.com; Tue, 05 Feb 2013 14:24:06 +0000 Received: from p5b2e3c52.dip.t-dialin.net ([91.46.60.82] helo=[192.168.2.5]) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1U2jRN-0002Iq-Ri for kernel-team@lists.ubuntu.com; Tue, 05 Feb 2013 14:24:05 +0000 Message-ID: <51111604.201@canonical.com> Date: Tue, 05 Feb 2013 15:24:04 +0100 From: Stefan Bader User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: kernel-team@lists.ubuntu.com Subject: Re: [ack-ish] Re: [Precise SRU] Call xen_poll_irq with interrupts disabled References: <1359995982-20657-1-git-send-email-stefan.bader@canonical.com> <20130205132556.GH10405@dm> <511114E2.9040408@canonical.com> In-Reply-To: <511114E2.9040408@canonical.com> X-Enigmail-Version: 1.4.6 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.13 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com On 05.02.2013 15:19, Stefan Bader wrote: > On 05.02.2013 14:25, Andy Whitcroft wrote: >> On Mon, Feb 04, 2013 at 05:39:41PM +0100, Stefan Bader wrote: >>> This has been sitting around for a while now. The problem is that >>> I was not yet able to find a convincing explanation *why* this is >>> fixing things. But testing was very positive. So its probably better >>> to carry it as SAUCE for now, at least for Precise. >>> I need to re-check things with latest kernels and Xen versions. >>> >>> -Stefan >>> >>> --- >>> >>> >>> From e89064720b518e3c64f0fe56b11edb44125f8b7e Mon Sep 17 00:00:00 2001 >>> From: Stefan Bader >>> Date: Thu, 18 Oct 2012 21:40:37 +0200 >>> Subject: [PATCH] UBUNTU: SAUCE: xen/pv-spinlock: Never enable interrupts in >>> xen_spin_lock_slow() >>> >>> The pv-ops spinlock code for Xen will call xen_spin_lock_slow() >>> if a lock cannot be obtained quickly by normal spinning. The >>> slow variant will put the VCPU onto a waiters list, then >>> enable interrupts (which are event channel messages in this >>> case) and calls a hypervisor function that is supposed to return >>> when the lock is released. >>> >>> Using a test case which has a lot of threads running that cause >>> a high utilization of this spinlock code, it is observed that >>> the VCPU which is the first one on the waiters list seems to be >>> doing other things (no trace of the hv call on the stack) while >>> the waiters list still has it as the head. So other VCPUs which >>> try to acquire the lock are stuck in the hv call forever. And >>> that sooner than later end up in some circular locking. >>> >>> By testing I can see this gets avoided when the interrupts are >>> not enabled before the hv call (and disabled right after). >> >> I think this should read "... avoided if we leave interrupts off before >> the call and remain off after it." > > It could be put that way as well. The code has interrupts disabled up to the > point where the poll_irq hypercall is done. In the old form it could be > (depending on the state of interrupts before the spinlock call) that interrupts > are enabled and right after the call they get disabled again. So sometimes > interrupts are enabled during the hypercall. > >> >>> In theory this could cause a performance degression. Though it >>> looked like the hypervisor would actuall re-enable the interrupts >>> after some safety measures. So maybe not much regression at all. >>> At least testing did not yield a significant penalty. >>> >>> Xen PVM is the only affected setup because that is the only >>> one using pv-ops spinlocks in a way that changes interrupt >>> flags (KVM maps the variant which could to the function that >>> does not modify those flags). >> >> I am taking this to say "KVM which uses a similar scheme also leaves the >> interrupts off over the equivalent calls.". > > Right, there is a spinlock_ops structure (maybe not that exact name) which has > two spinlock interfaces. One that allows to pass on the flags as they were and > the other which does not care. KVM only uses the don-care version which disables > interrupts and gives you no hint whether they where enabled before or not. > > >> >>> BugLink: http://bugs.launchpad.net/bugs/1011792 >>> >>> Signed-off-by: Stefan Bader >>> --- >>> arch/x86/xen/spinlock.c | 23 +++++------------------ >>> 1 file changed, 5 insertions(+), 18 deletions(-) >>> >>> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c >>> index d69cc6c..fa926ab 100644 >>> --- a/arch/x86/xen/spinlock.c >>> +++ b/arch/x86/xen/spinlock.c >>> @@ -24,7 +24,6 @@ static struct xen_spinlock_stats >>> u32 taken_slow_nested; >>> u32 taken_slow_pickup; >>> u32 taken_slow_spurious; >>> - u32 taken_slow_irqenable; >>> >>> u64 released; >>> u32 released_slow; >>> @@ -197,7 +196,7 @@ static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock >>> __this_cpu_write(lock_spinners, prev); >>> } >>> >>> -static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable) >>> +static noinline int xen_spin_lock_slow(struct arch_spinlock *lock) >>> { >>> struct xen_spinlock *xl = (struct xen_spinlock *)lock; >>> struct xen_spinlock *prev; >>> @@ -218,8 +217,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab >>> ADD_STATS(taken_slow_nested, prev != NULL); >>> >>> do { >>> - unsigned long flags; >>> - >>> /* clear pending */ >>> xen_clear_irq_pending(irq); >>> >>> @@ -239,12 +236,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab >>> goto out; >>> } >>> >>> - flags = arch_local_save_flags(); >>> - if (irq_enable) { >>> - ADD_STATS(taken_slow_irqenable, 1); >>> - raw_local_irq_enable(); >>> - } >>> - >>> /* >>> * Block until irq becomes pending. If we're >>> * interrupted at this point (after the trylock but >>> @@ -256,8 +247,6 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enab >>> */ >>> xen_poll_irq(irq); >>> >>> - raw_local_irq_restore(flags); >>> - >>> ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq)); >>> } while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */ >>> >>> @@ -270,7 +259,7 @@ out: >>> return ret; >>> } >>> >>> -static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable) >>> +static inline void __xen_spin_lock(struct arch_spinlock *lock) >>> { >>> struct xen_spinlock *xl = (struct xen_spinlock *)lock; >>> unsigned timeout; >>> @@ -302,19 +291,19 @@ static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable) >>> spin_time_accum_spinning(start_spin_fast); >>> >>> } while (unlikely(oldval != 0 && >>> - (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable)))); >>> + (TIMEOUT == ~0 || !xen_spin_lock_slow(lock)))); >>> >>> spin_time_accum_total(start_spin); >>> } >>> >>> static void xen_spin_lock(struct arch_spinlock *lock) >>> { >>> - __xen_spin_lock(lock, false); >>> + __xen_spin_lock(lock); >>> } >>> >>> static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags) >>> { >>> - __xen_spin_lock(lock, !raw_irqs_disabled_flags(flags)); >>> + __xen_spin_lock(lock); >>> } >>> >>> static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) >>> @@ -424,8 +413,6 @@ static int __init xen_spinlock_debugfs(void) >>> &spinlock_stats.taken_slow_pickup); >>> debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug, >>> &spinlock_stats.taken_slow_spurious); >>> - debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug, >>> - &spinlock_stats.taken_slow_irqenable); >>> >>> debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released); >>> debugfs_create_u32("released_slow", 0444, d_spin_debug, >> >> Based on the overall testing being so good, and on the fact that the >> change seems to make us more like KVM this ought to be good. >> >> As the fix is not yet upstream and under discussion still there, we might >> want to 'reduce' this patch a little and just modify xen_spin_lock_slow >> to essentially accept but ignore the irq_enable flag. It would reduce >> the code churn until we have an approved fix. >> >> Overall I think this has merit and though I would prefer a smaller code >> snippet for this, I think we do need this fix, and if others are happy >> with this larger patch I am happy: >> >> Acked-by: Andy Whitcroft >> >> -apw >> > > Yeah, I could basically just comment out the two interrupt changing lines in > xen_spin_lock_slow() if that is preferred. Just did not seem to make sense to > fake the statistics there. > > -Stefan > > > Not compile tested but something like this diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d69cc6c..6051359 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -242,7 +242,7 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock flags = arch_local_save_flags(); if (irq_enable) { ADD_STATS(taken_slow_irqenable, 1); - raw_local_irq_enable(); + /* raw_local_irq_enable(); */ } /* @@ -256,7 +256,7 @@ static noinline int xen_spin_lock_slow(struct arch_spinlock */ xen_poll_irq(irq); - raw_local_irq_restore(flags); + /* raw_local_irq_restore(flags); */ ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq)); } while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */