From patchwork Tue Apr 3 17:20:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 894718 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 40FwpK3gdlz9s1r; Wed, 4 Apr 2018 03:20:25 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1f3PbV-0001ay-CW; Tue, 03 Apr 2018 17:20:17 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1f3PbU-0001ak-3U for kernel-team@lists.ubuntu.com; Tue, 03 Apr 2018 17:20:16 +0000 Received: from 1.general.jsalisbury.us.vpn ([10.172.67.212] helo=salisbury) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1f3PbT-0005wh-P6 for kernel-team@lists.ubuntu.com; Tue, 03 Apr 2018 17:20:15 +0000 Received: by salisbury (Postfix, from userid 1000) id 89C0D7E4A5F; Tue, 3 Apr 2018 13:20:14 -0400 (EDT) From: Joseph Salisbury To: kernel-team@lists.ubuntu.com Subject: [Bionic][PATCH 1/1] powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened Date: Tue, 3 Apr 2018 13:20:14 -0400 Message-Id: <5540e56d9b9a2998b73b11363670a8d9be46c4a4.1522775475.git.joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: In-Reply-To: References: X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Nicholas Piggin BugLink: http://bugs.launchpad.net/bugs/1757497 force_external_irq_replay() can be called in the do_IRQ path with interrupts hard enabled and soft disabled if may_hard_irq_enable() set MSR[EE]=1. It updates local_paca->irq_happened with a load, modify, store sequence. If a maskable interrupt hits during this sequence, it will go to the masked handler to be marked pending in irq_happened. This update will be lost when the interrupt returns and the store instruction executes. This can result in unpredictable latencies, timeouts, lockups, etc. Fix this by ensuring hard interrupts are disabled before modifying irq_happened. This could cause any maskable asynchronous interrupt to get lost, but it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE, so very high external interrupt rate and high IPI rate. The hang was bisected down to enabling doorbell interrupts for IPIs. These provided an interrupt type that could run at high rates in the do_IRQ path, stressing the race. Fixes: 1d607bb3bd60 ("powerpc/irq: Add mechanism to force a replay of interrupts") Cc: stable@vger.kernel.org # v4.8+ Reported-by: Carol L. Soto Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman (cherry picked from commit ff6781fd1bb404d8a551c02c35c70cec1da17ff1) Signed-off-by: Joseph Salisbury --- arch/powerpc/kernel/irq.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index b7a8452..c933632 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -475,6 +475,14 @@ void force_external_irq_replay(void) */ WARN_ON(!arch_irqs_disabled()); + /* + * Interrupts must always be hard disabled before irq_happened is + * modified (to prevent lost update in case of interrupt between + * load and store). + */ + __hard_irq_disable(); + local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + /* Indicate in the PACA that we have an interrupt to replay */ local_paca->irq_happened |= PACA_IRQ_EE; }