Random scheduler/unaligned accesses crashes with perf lock events on sparc 64

From: Frederic Weisbecker <fweisbec@gmail.com>

From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Mon, 5 Apr 2010 21:40:58 +0200

> It happens without CONFIG_FUNCTION_TRACER as well (but it happens
> when the function tracer runs). And I hadn't your
> perf_arch_save_caller_regs() when I triggered this.

I figured out the problem, it's NMIs.  As soon as I disable all of the
NMI watchdog code, the problem goes away.

This is because some parts of the NMI interrupt handling path are not
marked with "notrace" and the various tracer code paths use
local_irq_disable() (either directly or indirectly) which doesn't work
with sparc64's NMI scheme.  These essentially turn NMIs back on in the
NMI handler before the NMI condition has been cleared, and thus we can
re-enter with another NMI interrupt.

We went through this for perf events, and we just made sure that
local_irq_{enable,disable}() never occurs in any of the code paths in
perf events that can be reached via the NMI interrupt handler.  (the
only one we had was sched_clock() and that was easily fixed)

So, the first mcount hit we get is for rcu_nmi_enter() via
nmi_enter().

I can see two ways to handle this:

1) Pepper 'notrace' markers onto rcu_nmi_enter(), rcu_nmi_exit()
   and whatever else I can see getting hit in the NMI interrupt
   handler code paths.

2) Add a hack to __raw_local_irq_save() that keeps it from writing
   anything to the interrupt level register if we have NMI's disabled.
   (this puts the cost on the entire kernel instead of just the NMI
   paths).

#1 seems to be the intent on other platforms, the majority of the NMI
code paths are protected with 'notrace' on x86, I bet nobody noticed
that nmi_enter() when CONFIG_NO_HZ && !CONFIG_TINY_RCU ends up calling
a function that does tracing.

The next one we'll hit is atomic_notifier_call_chain()  (amusingly
notify_die() is marked 'notrace' but the one thing it calls isn't)

For example, the following are the generic notrace annotations I
would need to get sparc64 ftrace functioning again. (Frederic I will
send you the full patch with the sparc specific bits under seperate
cover in so that you can test things...)

--------------------
kernel: Add notrace annotations to common routines invoked via NMI.

This includes the atomic notifier call chain as well as the RCU
specific NMI enter/exit handlers.

Signed-off-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	20100406.025049.267615796.davem@davemloft.net
State	RFC
Delegated to:	David Miller
Headers	show Return-Path: <sparclinux-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5B863B7CF1 for <patchwork-incoming@ozlabs.org>; Tue, 6 Apr 2010 19:51:06 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752650Ab0DFJus (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Tue, 6 Apr 2010 05:50:48 -0400 Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:54049 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752286Ab0DFJur (ORCPT <rfc822; sparclinux@vger.kernel.org>); Tue, 6 Apr 2010 05:50:47 -0400 Received: from localhost (localhost [127.0.0.1]) by sunset.davemloft.net (Postfix) with ESMTP id 1F43924C090; Tue, 6 Apr 2010 02:50:50 -0700 (PDT) Date: Tue, 06 Apr 2010 02:50:49 -0700 (PDT) Message-Id: <20100406.025049.267615796.davem@davemloft.net> To: fweisbec@gmail.com Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@elte.hu, acme@redhat.com, a.p.zijlstra@chello.nl, paulus@samba.org Subject: Re: Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 From: David Miller <davem@davemloft.net> In-Reply-To: <20100405194055.GA5265@nowhere> References: <20100405065701.GC5127@nowhere> <20100405.122233.188421941.davem@davemloft.net> <20100405194055.GA5265@nowhere> X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: sparclinux-owner@vger.kernel.org Precedence: bulk List-ID: <sparclinux.vger.kernel.org> X-Mailing-List: sparclinux@vger.kernel.org

Random scheduler/unaligned accesses crashes with perf lock events on sparc 64

Commit Message

Comments

Patch