From patchwork Wed Mar 30 23:51:49 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kamal Mostafa <kamal@canonical.com>
X-Patchwork-Id: 603930
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from huckleberry.canonical.com (huckleberry.canonical.com
	[91.189.94.19])
	by ozlabs.org (Postfix) with ESMTP id 3qb52929DRz9s5g;
	Thu, 31 Mar 2016 11:27:45 +1100 (AEDT)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.76)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1alQSa-00089a-Jy; Thu, 31 Mar 2016 00:27:40 +0000
Received: from youngberry.canonical.com ([91.189.89.112])
	by huckleberry.canonical.com with esmtps
	(TLS1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.76) (envelope-from <kamal@canonical.com>) id 1alPtx-0004OD-OF
	for kernel-team@lists.ubuntu.com; Wed, 30 Mar 2016 23:51:53 +0000
Received: from 1.general.kamal.us.vpn ([10.172.68.52] helo=fourier)
	by youngberry.canonical.com with esmtpsa
	(TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.76) (envelope-from <kamal@canonical.com>)
	id 1alPtx-0007Rk-2d; Wed, 30 Mar 2016 23:51:53 +0000
Received: from kamal by fourier with local (Exim 4.86_2)
	(envelope-from <kamal@whence.com>)
	id 1alPtu-0000fv-CM; Wed, 30 Mar 2016 16:51:50 -0700
From: Kamal Mostafa <kamal@canonical.com>
To: Kan Liang <kan.liang@intel.com>
Subject: [4.2.y-ckt stable] Patch "perf/x86/intel: Fix PEBS warning by only
	restoring active PMU in pmi" has been added to the 4.2.y-ckt tree
Date: Wed, 30 Mar 2016 16:51:49 -0700
Message-Id: <1459381909-2553-1-git-send-email-kamal@canonical.com>
X-Mailer: git-send-email 2.7.4
X-Extended-Stable: 4.2
X-Mailman-Approved-At: Thu, 31 Mar 2016 00:26:57 +0000
Cc: Vince Weaver <vincent.weaver@maine.edu>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kamal Mostafa <kamal@canonical.com>,
	Stephane Eranian <eranian@google.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	kernel-team@lists.ubuntu.com, Thomas Gleixner <tglx@linutronix.de>,
	Jiri Olsa <jolsa@redhat.com>, Ingo Molnar <mingo@kernel.org>
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
MIME-Version: 1.0
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: kernel-team-bounces@lists.ubuntu.com

This is a note to let you know that I have just added a patch titled

    perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

to the linux-4.2.y-queue branch of the 4.2.y-ckt extended stable tree 
which can be found at:

    http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-4.2.y-queue

This patch is scheduled to be released in version 4.2.8-ckt7.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 4.2.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

---8<------------------------------------------------------------

From 2a659dd551e1f0ea4e287e10c848a40d9319cfd8 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@intel.com>
Date: Thu, 3 Mar 2016 18:07:28 -0500
Subject: perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

commit c3d266c8a9838cc141b69548bc3b1b18808ae8c4 upstream.

This patch tries to fix a PEBS warning found in my stress test. The
following perf command can easily trigger the pebs warning or spurious
NMI error on Skylake/Broadwell/Haswell platforms:

  sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a

Also the NMI watchdog must be enabled.

For this case, the events number is larger than counter number. So
perf has to do multiplexing.

In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
old events, rotate_ctx, schedule in new events and finally
perf_pmu_enable().

If the old events include precise event, the MSR_IA32_PEBS_ENABLE
should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
should keep 0 until the perf_pmu_enable() is called and the new event is
precise event.

However, there is a corner case which could restore PEBS_ENABLE to
stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
will be set to 0 to stop overflow and followed PMI. But there may be
pending PMI from an earlier overflow, which cannot be stopped. So even
GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
the end of the PMI handler, __intel_pmu_enable_all() will be called,
which will restore the stale values if old events haven't scheduled
out.

Once the stale pebs value is set, it's impossible to be corrected if
the new events are non-precise. Because the pebs_enabled will be set
to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
setting. As a result, the following NMI with stale PEBS_ENABLE
trigger pebs warning.

The pending PMI after enabled=0 will become harmless if the NMI handler
does not change the state. This patch checks cpuc->enabled in pmi and
only restore the state when PMU is active.

Here is the dump:

  Call Trace:
   <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
   [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
   [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
   [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
   [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
   [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
   [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
   [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
   [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
   [<ffffffff810310b9>] nmi_handle+0x69/0x120
   [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
   [<ffffffff810317f2>] do_nmi+0xe2/0x130
   [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
   [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
   [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
   [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
   [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
   [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
   [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
   [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
   [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
   [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
   <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
   [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff8119765a>] event_function_call+0x10a/0x120
   [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
   [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
   [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
   [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
   [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
   [<ffffffff81036d29>] ? sched_clock+0x9/0x10
   [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
   [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  ---[ end trace aef202839fe9a71d ]---
  Uhhuh. NMI received for unknown reason 2d on CPU 2.
  Do you have a strange power saving mode enabled?

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
[ Fixed various typos and other small details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 4.2-stable: files renamed ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 arch/x86/kernel/cpu/perf_event.c       | 13 +++++++++++++
 arch/x86/kernel/cpu/perf_event_intel.c | 15 +++++++++++++--
 arch/x86/kernel/cpu/perf_event_knc.c   |  4 +++-
 3 files changed, 29 insertions(+), 3 deletions(-)

--
2.7.4

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index a27d738..d93690c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -593,6 +593,19 @@ void x86_pmu_disable_all(void)
 	}
 }

+/*
+ * There may be PMI landing after enabled=0. The PMI hitting could be before or
+ * after disable_all.
+ *
+ * If PMI hits before disable_all, the PMU will be disabled in the NMI handler.
+ * It will not be re-enabled in the NMI handler again, because enabled=0. After
+ * handling the NMI, disable_all will be called, which will not change the
+ * state either. If PMI hits after disable_all, the PMU is already disabled
+ * before entering NMI handler. The NMI handler will not change the state
+ * either.
+ *
+ * So either situation is harmless.
+ */
 static void x86_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 1b09c42..9af4ba8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1244,7 +1244,15 @@ static __initconst const u64 slm_hw_cache_event_ids
 };

 /*
- * Use from PMIs where the LBRs are already disabled.
+ * Used from PMIs where the LBRs are already disabled.
+ *
+ * This function could be called consecutively. It is required to remain in
+ * disabled state if called consecutively.
+ *
+ * During consecutive calls, the same disable value will be written to related
+ * registers, so the PMU state remains unchanged. hw.state in
+ * intel_bts_disable_local will remain PERF_HES_STOPPED too in consecutive
+ * calls.
  */
 static void __intel_pmu_disable_all(void)
 {
@@ -1670,7 +1678,10 @@ again:
 		goto again;

 done:
-	__intel_pmu_enable_all(0, true);
+	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
+	if (cpuc->enabled)
+		__intel_pmu_enable_all(0, true);
+
 	/*
 	 * Only unmask the NMI after the overflow counters
 	 * have been reset. This avoids spurious NMIs on
diff --git a/arch/x86/kernel/cpu/perf_event_knc.c b/arch/x86/kernel/cpu/perf_event_knc.c
index 5b0c232..b931095 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -263,7 +263,9 @@ again:
 		goto again;

 done:
-	knc_pmu_enable_all(0);
+	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
+	if (cpuc->enabled)
+		knc_pmu_enable_all(0);

 	return handled;
 }