diff mbox series

[V2,1/1] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

Message ID 1618887697-1556-2-git-send-email-atrajeev@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show
Series powerpc/perf: Clear pending PMI in ppmu callbacks | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (40f5c8e99b3f2f53db08055f415af2aac416360e)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 4 warnings, 1 checks, 158 lines checked
snowpatch_ozlabs/needsstable warning Please consider tagging this patch for stable!

Commit Message

Athira Rajeev April 20, 2021, 3:01 a.m. UTC
Running perf fuzzer showed below in dmesg logs:
"Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq and doorbell exception, we
conditionally enable interrupts via may_hard_irq_enable(). This could
potentially create a window to generate a PMI. Since irq soft mask is set
to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before
perf interrupt is replayed and the PMU events could deleted or stopped.
This will change the PMU SPR values and resets the counters. Snippet of
ftrace log showing PMU callbacks invoked in "__do_irq":

<idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
<idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
<idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
<idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
<<>>
<idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
<idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
<idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
<idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
<idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
<idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
<idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
<idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
<idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
<idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
<idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
<idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
<<>>
<idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
<idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add.
If the local_add operation happens within a local_irq_save, replay of
PMI will be during local_irq_restore. Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Patch adds a fix to update the PMU callback functions (del,stop,enable) to
check for pending perf interrupt. If there is an overflown PMC and pending
perf interrupt indicated in Paca or by PMAO bit set in MMCR0, clear the PMI
bit in paca to drop that sample. Also clear the MMCR0 PMAO bit which
otherwise could lead to spurious interrupts in some corner cases. Example,
a timer after power_pmu_del which will re-enable interrupts since PMI is
cleared and triggers a PMI again since PMAO bit is still set. Another
condition occures if had disabled MSR[EE] right before perf interrupt
came in. Re-enabling interrupt will trigger PMI since PMAO is still set.
But fails to find valid overflow if PMC get cleared before enabling EE.

We can't just replay PMI any time. Hence this approach is preferred rather
than replaying PMI before resetting overflown PMC. Patch also documents
core-book3s on a race condition which can trigger these PMC messages during
idle path in PowerNV.

Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hw_irq.h | 19 ++++++++
 arch/powerpc/perf/core-book3s.c   | 77 +++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

Comments

Nicholas Piggin May 17, 2021, 3:20 a.m. UTC | #1
Sorry I missed this :(

Excerpts from Athira Rajeev's message of April 20, 2021 1:01 pm:
> Running perf fuzzer showed below in dmesg logs:
> "Can't find PMC that caused IRQ"
> 
> This means a PMU exception happened, but none of the PMC's (Performance
> Monitor Counter) were found to be overflown. There are some corner cases
> that clears the PMCs after PMI gets masked. In such cases, the perf
> interrupt handler will not find the active PMC values that had caused
> the overflow and thus leads to this message while replaying.
> 
> Case 1: PMU Interrupt happens during replay of other interrupts and
> counter values gets cleared by PMU callbacks before replay:
> 
> During replay of interrupts like timer, __do_irq and doorbell exception, we
> conditionally enable interrupts via may_hard_irq_enable(). This could
> potentially create a window to generate a PMI. Since irq soft mask is set
> to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before
> perf interrupt is replayed and the PMU events could deleted or stopped.
> This will change the PMU SPR values and resets the counters. Snippet of
> ftrace log showing PMU callbacks invoked in "__do_irq":
> 
> <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
> <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
> <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
> <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
> <<>>
> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
> <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
> <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
> <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
> <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
> <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
> <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
> <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
> <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
> <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
> <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
> <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
> <<>>
> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
> <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts
> 
> Case 2: PMI's masked during local_* operations, example local_add.
> If the local_add operation happens within a local_irq_save, replay of
> PMI will be during local_irq_restore. Similar to case 1, this could
> also create a window before replay where PMU events gets deleted or
> stopped.
> 
> Patch adds a fix to update the PMU callback functions (del,stop,enable) to
> check for pending perf interrupt. If there is an overflown PMC and pending
> perf interrupt indicated in Paca or by PMAO bit set in MMCR0, clear the PMI
> bit in paca to drop that sample. Also clear the MMCR0 PMAO bit which
> otherwise could lead to spurious interrupts in some corner cases. Example,
> a timer after power_pmu_del which will re-enable interrupts since PMI is
> cleared and triggers a PMI again since PMAO bit is still set. Another
> condition occures if had disabled MSR[EE] right before perf interrupt
> came in. Re-enabling interrupt will trigger PMI since PMAO is still set.
> But fails to find valid overflow if PMC get cleared before enabling EE.
> 
> We can't just replay PMI any time. Hence this approach is preferred rather
> than replaying PMI before resetting overflown PMC. Patch also documents
> core-book3s on a race condition which can trigger these PMC messages during
> idle path in PowerNV.
> 
> Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them")
> Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>

I would say you can leave ^ this line out. You and Maddy did the hard 
work of coming up with the fix, I just suggested a few minor changes.

> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/hw_irq.h | 19 ++++++++
>  arch/powerpc/perf/core-book3s.c   | 77 +++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 56a98936a6a9..7e192bd8253b 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -215,6 +215,23 @@ static inline bool arch_irqs_disabled(void)
>  	return arch_irqs_disabled_flags(arch_local_save_flags());
>  }
>  
> +static inline int get_clear_pmi_irq_pending(void)
> +{
> +	/*
> +	 * Some corner cases could clear the PMU counter overflow
> +	 * while a masked PMI is pending. One of such case is
> +	 * when a PMI happens during interrupt replay and perf
> +	 * counter values gets cleared by PMU callbacks before
> +	 * replay. So the pending PMI must be cleared here.
> +	 */
> +	if (get_paca()->irq_happened & PACA_IRQ_PMI) {
> +		WARN_ON_ONCE(mfmsr() & MSR_EE);
> +		get_paca()->irq_happened &= ~PACA_IRQ_PMI;
> +		return 1;
> +	}
> +	return 0;
> +}

This is fine, if you respin it, you could make it return bool, and put 
the warning under if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)).

> +
>  #ifdef CONFIG_PPC_BOOK3S
>  /*
>   * To support disabling and enabling of irq with PMI, set of
> @@ -391,6 +408,8 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
>  
>  static inline void may_hard_irq_enable(void) { }
>  
> +static inline int get_clear_pmi_irq_pending(void) { return 0; }
> +
>  static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
>  {
>  }
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 766f064f00fb..6ddac1b913fa 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -847,6 +847,20 @@ static void write_pmc(int idx, unsigned long val)
>  	}
>  }
>  
> +static int pmc_overflown(int idx)
> +{
> +	unsigned long val[8];
> +	int i;
> +
> +	for (i = 0; i < ppmu->n_counter; i++)
> +		val[i] = read_pmc(i + 1);
> +
> +	if ((int)val[idx-1] < 0)
> +		return 1;
> +
> +	return 0;

I don't know the PMU very well -- do you have to read all counters, or 
can you just read_pmc(idx)?

> +}
> +
>  /* Called from sysrq_handle_showregs() */
>  void perf_event_print_debug(void)
>  {
> @@ -1438,6 +1452,17 @@ static void power_pmu_enable(struct pmu *pmu)
>  		event = cpuhw->event[i];
>  		if (event->hw.idx && event->hw.idx != hwc_index[i] + 1) {
>  			power_pmu_read(event);
> +			/*
> +			 * if the PMC corresponding to event->hw.idx is
> +			 * overflown, check if there is any pending perf
> +			 * interrupt set in paca. If so, disable the interrupt
> +			 * by clearing the paca bit for PMI since we are going
> +			 * to reset the PMC. power_pmu_enable will reset PMAO bit
> +			 * of MMCR0 while enabling the event. So PMAO check
> +			 * is not needed here (versus PMAO check done in del/stop).
> +			 */
> +			if (pmc_overflown(event->hw.idx))
> +				get_clear_pmi_irq_pending();

power_pmu_read(event) is already reading the PMC, so I wonder if you 
could just take that value directly. It's in event->hw.prev_count, I 
think?

>  			write_pmc(event->hw.idx, 0);
>  			event->hw.idx = 0;
>  		}
> @@ -1474,6 +1499,10 @@ static void power_pmu_enable(struct pmu *pmu)
>  		event->hw.idx = idx;
>  		if (event->hw.state & PERF_HES_STOPPED)
>  			val = 0;
> +
> +		/* See above for get_clear_pmi_irq_pending */
> +		if (pmc_overflown(event->hw.idx))
> +			get_clear_pmi_irq_pending();

Sorry, not sure of why this is done here. Wouldn't the PMC have already 
been observed to be overflown and pending PMI cleared by the previous
check?

>  		write_pmc(idx, val);
>  
>  		perf_event_update_userpage(event);
> @@ -1619,6 +1648,7 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>  	struct cpu_hw_events *cpuhw;
>  	long i;
>  	unsigned long flags;
> +	unsigned long val_mmcr0;
>  
>  	local_irq_save(flags);
>  	perf_pmu_disable(event->pmu);
> @@ -1636,6 +1666,23 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>  			--cpuhw->n_events;
>  			ppmu->disable_pmc(event->hw.idx - 1, &cpuhw->mmcr);
>  			if (event->hw.idx) {
> +				/*
> +				 * if the PMC corresponding to event->hw.idx is
> +				 * overflown, check if there is any pending perf
> +				 * interrupt set in paca or indicated by PMAO bit
> +				 * in MMCR0. If so, disable the interrupt and clear
> +				 * the MMCR0 PMAO bit since we are going to reset
> +				 * the PMC and delete the event.
> +				 */
> +				if (pmc_overflown(event->hw.idx)) {
> +					if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
> +						val_mmcr0 = mfspr(SPRN_MMCR0);
> +						val_mmcr0 &= ~MMCR0_PMAO;
> +						write_mmcr0(cpuhw, val_mmcr0);
> +						mb();
> +						isync();
> +					}
> +				}

Again showing my ignorance of PMU and powerpc/perf, power_pmu_disable 
will have already cleared PMAO? How is it possible for it to become
set again here?

I can see how a PMI can be marked pending after local_irq_save and 
before perf_pmu_disable call... I wonder if clearing that pending bit 
should be done in power_pmu_disable (which matches the same place where 
we clear PMAO)?


>  				write_pmc(event->hw.idx, 0);
>  				event->hw.idx = 0;
>  			}
> @@ -1705,6 +1752,8 @@ static void power_pmu_start(struct perf_event *event, int ef_flags)
>  static void power_pmu_stop(struct perf_event *event, int ef_flags)
>  {
>  	unsigned long flags;
> +	unsigned long val_mmcr0;
> +	struct cpu_hw_events *cpuhw;
>  
>  	if (!event->hw.idx || !event->hw.sample_period)
>  		return;
> @@ -1713,8 +1762,27 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags)
>  		return;
>  
>  	local_irq_save(flags);
> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
>  	perf_pmu_disable(event->pmu);
>  
> +	/*
> +	 * if the PMC corresponding to event->hw.idx is
> +	 * overflown, check if there is any pending perf
> +	 * interrupt set in paca or indicated by PMAO bit
> +	 * in MMCR0. If so, disable the interrupt and clear
> +	 * the MMCR0 PMAO bit since we are going to reset
> +	 * the PMC.
> +	 */
> +	if (pmc_overflown(event->hw.idx)) {
> +		if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
> +			val_mmcr0 = mfspr(SPRN_MMCR0);
> +			val_mmcr0 &= ~MMCR0_PMAO;
> +			write_mmcr0(cpuhw, val_mmcr0);
> +			mb();
> +			isync();
> +		}
> +	}
> +
>  	power_pmu_read(event);
>  	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
>  	write_pmc(event->hw.idx, 0);

Same question for this one. Also would be good to put this pattern into 
a helper function if it has to be duplicated.

Thanks,
Nick

> @@ -2343,6 +2411,15 @@ static void __perf_event_interrupt(struct pt_regs *regs)
>  			}
>  		}
>  	}
> +
> +	/*
> +	 * During system wide profling or while specific CPU
> +	 * is monitored for an event, some corner cases could
> +	 * cause PMC to overflow in idle path. This will trigger
> +	 * a PMI after waking up from idle. Since counter values
> +	 * are _not_ saved/restored in idle path, can lead to
> +	 * below "Can't find PMC" message.
> +	 */
>  	if (unlikely(!found) && !arch_irq_disabled_regs(regs))
>  		printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");
>  
> -- 
> 2.26.2
> 
>
Athira Rajeev May 23, 2021, 11:52 a.m. UTC | #2
> On 17-May-2021, at 8:50 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Sorry I missed this :(
> 
> Excerpts from Athira Rajeev's message of April 20, 2021 1:01 pm:
>> Running perf fuzzer showed below in dmesg logs:
>> "Can't find PMC that caused IRQ"
>> 
>> This means a PMU exception happened, but none of the PMC's (Performance
>> Monitor Counter) were found to be overflown. There are some corner cases
>> that clears the PMCs after PMI gets masked. In such cases, the perf
>> interrupt handler will not find the active PMC values that had caused
>> the overflow and thus leads to this message while replaying.
>> 
>> Case 1: PMU Interrupt happens during replay of other interrupts and
>> counter values gets cleared by PMU callbacks before replay:
>> 
>> During replay of interrupts like timer, __do_irq and doorbell exception, we
>> conditionally enable interrupts via may_hard_irq_enable(). This could
>> potentially create a window to generate a PMI. Since irq soft mask is set
>> to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before
>> perf interrupt is replayed and the PMU events could deleted or stopped.
>> This will change the PMU SPR values and resets the counters. Snippet of
>> ftrace log showing PMU callbacks invoked in "__do_irq":
>> 
>> <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
>> <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
>> <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
>> <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
>> <<>>
>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
>> <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
>> <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
>> <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
>> <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
>> <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
>> <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
>> <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
>> <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
>> <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
>> <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
>> <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
>> <<>>
>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
>> <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts
>> 
>> Case 2: PMI's masked during local_* operations, example local_add.
>> If the local_add operation happens within a local_irq_save, replay of
>> PMI will be during local_irq_restore. Similar to case 1, this could
>> also create a window before replay where PMU events gets deleted or
>> stopped.
>> 
>> Patch adds a fix to update the PMU callback functions (del,stop,enable) to
>> check for pending perf interrupt. If there is an overflown PMC and pending
>> perf interrupt indicated in Paca or by PMAO bit set in MMCR0, clear the PMI
>> bit in paca to drop that sample. Also clear the MMCR0 PMAO bit which
>> otherwise could lead to spurious interrupts in some corner cases. Example,
>> a timer after power_pmu_del which will re-enable interrupts since PMI is
>> cleared and triggers a PMI again since PMAO bit is still set. Another
>> condition occures if had disabled MSR[EE] right before perf interrupt
>> came in. Re-enabling interrupt will trigger PMI since PMAO is still set.
>> But fails to find valid overflow if PMC get cleared before enabling EE.
>> 
>> We can't just replay PMI any time. Hence this approach is preferred rather
>> than replaying PMI before resetting overflown PMC. Patch also documents
>> core-book3s on a race condition which can trigger these PMC messages during
>> idle path in PowerNV.
>> 
>> Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them")
>> Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
>> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> 
> I would say you can leave ^ this line out. You and Maddy did the hard 
> work of coming up with the fix, I just suggested a few minor changes.

Thanks Nick for reviewing the patch and your suggestions helped us
in the solution approach :) 

> 
>> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/hw_irq.h | 19 ++++++++
>> arch/powerpc/perf/core-book3s.c   | 77 +++++++++++++++++++++++++++++++
>> 2 files changed, 96 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
>> index 56a98936a6a9..7e192bd8253b 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -215,6 +215,23 @@ static inline bool arch_irqs_disabled(void)
>> 	return arch_irqs_disabled_flags(arch_local_save_flags());
>> }
>> 
>> +static inline int get_clear_pmi_irq_pending(void)
>> +{
>> +	/*
>> +	 * Some corner cases could clear the PMU counter overflow
>> +	 * while a masked PMI is pending. One of such case is
>> +	 * when a PMI happens during interrupt replay and perf
>> +	 * counter values gets cleared by PMU callbacks before
>> +	 * replay. So the pending PMI must be cleared here.
>> +	 */
>> +	if (get_paca()->irq_happened & PACA_IRQ_PMI) {
>> +		WARN_ON_ONCE(mfmsr() & MSR_EE);
>> +		get_paca()->irq_happened &= ~PACA_IRQ_PMI;
>> +		return 1;
>> +	}
>> +	return 0;
>> +}
> 
> This is fine, if you respin it, you could make it return bool, and put 
> the warning under if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)).

Sure, I will make this change in next version.

> 
>> +
>> #ifdef CONFIG_PPC_BOOK3S
>> /*
>>  * To support disabling and enabling of irq with PMI, set of
>> @@ -391,6 +408,8 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
>> 
>> static inline void may_hard_irq_enable(void) { }
>> 
>> +static inline int get_clear_pmi_irq_pending(void) { return 0; }
>> +
>> static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
>> {
>> }
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 766f064f00fb..6ddac1b913fa 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -847,6 +847,20 @@ static void write_pmc(int idx, unsigned long val)
>> 	}
>> }
>> 
>> +static int pmc_overflown(int idx)
>> +{
>> +	unsigned long val[8];
>> +	int i;
>> +
>> +	for (i = 0; i < ppmu->n_counter; i++)
>> +		val[i] = read_pmc(i + 1);
>> +
>> +	if ((int)val[idx-1] < 0)
>> +		return 1;
>> +
>> +	return 0;
> 
> I don't know the PMU very well -- do you have to read all counters, or 
> can you just read_pmc(idx)?

You are right Nick. Just read_pmc(idx) should be good here.
I will fix this in v3.

> 
>> +}
>> +
>> /* Called from sysrq_handle_showregs() */
>> void perf_event_print_debug(void)
>> {
>> @@ -1438,6 +1452,17 @@ static void power_pmu_enable(struct pmu *pmu)
>> 		event = cpuhw->event[i];
>> 		if (event->hw.idx && event->hw.idx != hwc_index[i] + 1) {
>> 			power_pmu_read(event);
>> +			/*
>> +			 * if the PMC corresponding to event->hw.idx is
>> +			 * overflown, check if there is any pending perf
>> +			 * interrupt set in paca. If so, disable the interrupt
>> +			 * by clearing the paca bit for PMI since we are going
>> +			 * to reset the PMC. power_pmu_enable will reset PMAO bit
>> +			 * of MMCR0 while enabling the event. So PMAO check
>> +			 * is not needed here (versus PMAO check done in del/stop).
>> +			 */
>> +			if (pmc_overflown(event->hw.idx))
>> +				get_clear_pmi_irq_pending();
> 
> power_pmu_read(event) is already reading the PMC, so I wonder if you 
> could just take that value directly. It's in event->hw.prev_count, I 
> think?

Yes, Sure
> 
>> 			write_pmc(event->hw.idx, 0);
>> 			event->hw.idx = 0;
>> 		}
>> @@ -1474,6 +1499,10 @@ static void power_pmu_enable(struct pmu *pmu)
>> 		event->hw.idx = idx;
>> 		if (event->hw.state & PERF_HES_STOPPED)
>> 			val = 0;
>> +
>> +		/* See above for get_clear_pmi_irq_pending */
>> +		if (pmc_overflown(event->hw.idx))
>> +			get_clear_pmi_irq_pending();
> 
> Sorry, not sure of why this is done here. Wouldn't the PMC have already 
> been observed to be overflown and pending PMI cleared by the previous
> check?

Previous check is not executed always. First check is done if any existing events
needs to moved to different PMC. So I have added check in both places where we
update counter values. 

> 
>> 		write_pmc(idx, val);
>> 
>> 		perf_event_update_userpage(event);
>> @@ -1619,6 +1648,7 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>> 	struct cpu_hw_events *cpuhw;
>> 	long i;
>> 	unsigned long flags;
>> +	unsigned long val_mmcr0;
>> 
>> 	local_irq_save(flags);
>> 	perf_pmu_disable(event->pmu);
>> @@ -1636,6 +1666,23 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>> 			--cpuhw->n_events;
>> 			ppmu->disable_pmc(event->hw.idx - 1, &cpuhw->mmcr);
>> 			if (event->hw.idx) {
>> +				/*
>> +				 * if the PMC corresponding to event->hw.idx is
>> +				 * overflown, check if there is any pending perf
>> +				 * interrupt set in paca or indicated by PMAO bit
>> +				 * in MMCR0. If so, disable the interrupt and clear
>> +				 * the MMCR0 PMAO bit since we are going to reset
>> +				 * the PMC and delete the event.
>> +				 */
>> +				if (pmc_overflown(event->hw.idx)) {
>> +					if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
>> +						val_mmcr0 = mfspr(SPRN_MMCR0);
>> +						val_mmcr0 &= ~MMCR0_PMAO;
>> +						write_mmcr0(cpuhw, val_mmcr0);
>> +						mb();
>> +						isync();
>> +					}
>> +				}
> 
> Again showing my ignorance of PMU and powerpc/perf, power_pmu_disable 
> will have already cleared PMAO? How is it possible for it to become
> set again here?
> 
> I can see how a PMI can be marked pending after local_irq_save and 
> before perf_pmu_disable call... I wonder if clearing that pending bit 
> should be done in power_pmu_disable (which matches the same place where 
> we clear PMAO)?

Nick,
Power_pmu_disable callback is not called always when power_pmu_del/stop is invoked.
Snippet from kernel/events/core.c for perf_pmu_disable

void perf_pmu_disable(struct pmu *pmu)
{
        int *count = this_cpu_ptr(pmu->pmu_disable_count);
        if (!(*count)++)
                pmu->pmu_disable(pmu);
}

Also we want to clear PMI specifically if any event is going to get deleted/stopped
has an overflow PMC and PMI pending. Hence added the check in specific
callback before we reset the PMC’s.
> 
> 
>> 				write_pmc(event->hw.idx, 0);
>> 				event->hw.idx = 0;
>> 			}
>> @@ -1705,6 +1752,8 @@ static void power_pmu_start(struct perf_event *event, int ef_flags)
>> static void power_pmu_stop(struct perf_event *event, int ef_flags)
>> {
>> 	unsigned long flags;
>> +	unsigned long val_mmcr0;
>> +	struct cpu_hw_events *cpuhw;
>> 
>> 	if (!event->hw.idx || !event->hw.sample_period)
>> 		return;
>> @@ -1713,8 +1762,27 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags)
>> 		return;
>> 
>> 	local_irq_save(flags);
>> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
>> 	perf_pmu_disable(event->pmu);
>> 
>> +	/*
>> +	 * if the PMC corresponding to event->hw.idx is
>> +	 * overflown, check if there is any pending perf
>> +	 * interrupt set in paca or indicated by PMAO bit
>> +	 * in MMCR0. If so, disable the interrupt and clear
>> +	 * the MMCR0 PMAO bit since we are going to reset
>> +	 * the PMC.
>> +	 */
>> +	if (pmc_overflown(event->hw.idx)) {
>> +		if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
>> +			val_mmcr0 = mfspr(SPRN_MMCR0);
>> +			val_mmcr0 &= ~MMCR0_PMAO;
>> +			write_mmcr0(cpuhw, val_mmcr0);
>> +			mb();
>> +			isync();
>> +		}
>> +	}
>> +
>> 	power_pmu_read(event);
>> 	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
>> 	write_pmc(event->hw.idx, 0);
> 
> Same question for this one. Also would be good to put this pattern into 
> a helper function if it has to be duplicated.

Sure Nick, I will check on adding this pattern to a helper function.

Thanks
Athira
> 
> Thanks,
> Nick
> 
>> @@ -2343,6 +2411,15 @@ static void __perf_event_interrupt(struct pt_regs *regs)
>> 			}
>> 		}
>> 	}
>> +
>> +	/*
>> +	 * During system wide profling or while specific CPU
>> +	 * is monitored for an event, some corner cases could
>> +	 * cause PMC to overflow in idle path. This will trigger
>> +	 * a PMI after waking up from idle. Since counter values
>> +	 * are _not_ saved/restored in idle path, can lead to
>> +	 * below "Can't find PMC" message.
>> +	 */
>> 	if (unlikely(!found) && !arch_irq_disabled_regs(regs))
>> 		printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");
>> 
>> -- 
>> 2.26.2
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 56a98936a6a9..7e192bd8253b 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -215,6 +215,23 @@  static inline bool arch_irqs_disabled(void)
 	return arch_irqs_disabled_flags(arch_local_save_flags());
 }
 
+static inline int get_clear_pmi_irq_pending(void)
+{
+	/*
+	 * Some corner cases could clear the PMU counter overflow
+	 * while a masked PMI is pending. One of such case is
+	 * when a PMI happens during interrupt replay and perf
+	 * counter values gets cleared by PMU callbacks before
+	 * replay. So the pending PMI must be cleared here.
+	 */
+	if (get_paca()->irq_happened & PACA_IRQ_PMI) {
+		WARN_ON_ONCE(mfmsr() & MSR_EE);
+		get_paca()->irq_happened &= ~PACA_IRQ_PMI;
+		return 1;
+	}
+	return 0;
+}
+
 #ifdef CONFIG_PPC_BOOK3S
 /*
  * To support disabling and enabling of irq with PMI, set of
@@ -391,6 +408,8 @@  static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 
 static inline void may_hard_irq_enable(void) { }
 
+static inline int get_clear_pmi_irq_pending(void) { return 0; }
+
 static inline void irq_soft_mask_regs_set_state(struct pt_regs *regs, unsigned long val)
 {
 }
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 766f064f00fb..6ddac1b913fa 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -847,6 +847,20 @@  static void write_pmc(int idx, unsigned long val)
 	}
 }
 
+static int pmc_overflown(int idx)
+{
+	unsigned long val[8];
+	int i;
+
+	for (i = 0; i < ppmu->n_counter; i++)
+		val[i] = read_pmc(i + 1);
+
+	if ((int)val[idx-1] < 0)
+		return 1;
+
+	return 0;
+}
+
 /* Called from sysrq_handle_showregs() */
 void perf_event_print_debug(void)
 {
@@ -1438,6 +1452,17 @@  static void power_pmu_enable(struct pmu *pmu)
 		event = cpuhw->event[i];
 		if (event->hw.idx && event->hw.idx != hwc_index[i] + 1) {
 			power_pmu_read(event);
+			/*
+			 * if the PMC corresponding to event->hw.idx is
+			 * overflown, check if there is any pending perf
+			 * interrupt set in paca. If so, disable the interrupt
+			 * by clearing the paca bit for PMI since we are going
+			 * to reset the PMC. power_pmu_enable will reset PMAO bit
+			 * of MMCR0 while enabling the event. So PMAO check
+			 * is not needed here (versus PMAO check done in del/stop).
+			 */
+			if (pmc_overflown(event->hw.idx))
+				get_clear_pmi_irq_pending();
 			write_pmc(event->hw.idx, 0);
 			event->hw.idx = 0;
 		}
@@ -1474,6 +1499,10 @@  static void power_pmu_enable(struct pmu *pmu)
 		event->hw.idx = idx;
 		if (event->hw.state & PERF_HES_STOPPED)
 			val = 0;
+
+		/* See above for get_clear_pmi_irq_pending */
+		if (pmc_overflown(event->hw.idx))
+			get_clear_pmi_irq_pending();
 		write_pmc(idx, val);
 
 		perf_event_update_userpage(event);
@@ -1619,6 +1648,7 @@  static void power_pmu_del(struct perf_event *event, int ef_flags)
 	struct cpu_hw_events *cpuhw;
 	long i;
 	unsigned long flags;
+	unsigned long val_mmcr0;
 
 	local_irq_save(flags);
 	perf_pmu_disable(event->pmu);
@@ -1636,6 +1666,23 @@  static void power_pmu_del(struct perf_event *event, int ef_flags)
 			--cpuhw->n_events;
 			ppmu->disable_pmc(event->hw.idx - 1, &cpuhw->mmcr);
 			if (event->hw.idx) {
+				/*
+				 * if the PMC corresponding to event->hw.idx is
+				 * overflown, check if there is any pending perf
+				 * interrupt set in paca or indicated by PMAO bit
+				 * in MMCR0. If so, disable the interrupt and clear
+				 * the MMCR0 PMAO bit since we are going to reset
+				 * the PMC and delete the event.
+				 */
+				if (pmc_overflown(event->hw.idx)) {
+					if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
+						val_mmcr0 = mfspr(SPRN_MMCR0);
+						val_mmcr0 &= ~MMCR0_PMAO;
+						write_mmcr0(cpuhw, val_mmcr0);
+						mb();
+						isync();
+					}
+				}
 				write_pmc(event->hw.idx, 0);
 				event->hw.idx = 0;
 			}
@@ -1705,6 +1752,8 @@  static void power_pmu_start(struct perf_event *event, int ef_flags)
 static void power_pmu_stop(struct perf_event *event, int ef_flags)
 {
 	unsigned long flags;
+	unsigned long val_mmcr0;
+	struct cpu_hw_events *cpuhw;
 
 	if (!event->hw.idx || !event->hw.sample_period)
 		return;
@@ -1713,8 +1762,27 @@  static void power_pmu_stop(struct perf_event *event, int ef_flags)
 		return;
 
 	local_irq_save(flags);
+	cpuhw = this_cpu_ptr(&cpu_hw_events);
 	perf_pmu_disable(event->pmu);
 
+	/*
+	 * if the PMC corresponding to event->hw.idx is
+	 * overflown, check if there is any pending perf
+	 * interrupt set in paca or indicated by PMAO bit
+	 * in MMCR0. If so, disable the interrupt and clear
+	 * the MMCR0 PMAO bit since we are going to reset
+	 * the PMC.
+	 */
+	if (pmc_overflown(event->hw.idx)) {
+		if ((get_clear_pmi_irq_pending() | (mfspr(SPRN_MMCR0) & MMCR0_PMAO))) {
+			val_mmcr0 = mfspr(SPRN_MMCR0);
+			val_mmcr0 &= ~MMCR0_PMAO;
+			write_mmcr0(cpuhw, val_mmcr0);
+			mb();
+			isync();
+		}
+	}
+
 	power_pmu_read(event);
 	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
 	write_pmc(event->hw.idx, 0);
@@ -2343,6 +2411,15 @@  static void __perf_event_interrupt(struct pt_regs *regs)
 			}
 		}
 	}
+
+	/*
+	 * During system wide profling or while specific CPU
+	 * is monitored for an event, some corner cases could
+	 * cause PMC to overflow in idle path. This will trigger
+	 * a PMI after waking up from idle. Since counter values
+	 * are _not_ saved/restored in idle path, can lead to
+	 * below "Can't find PMC" message.
+	 */
 	if (unlikely(!found) && !arch_irq_disabled_regs(regs))
 		printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");