mbox series

[v4,0/5] powerpc/perf: IMC trace-mode support

Message ID 20190415101204.15125-1-anju@linux.vnet.ibm.com (mailing list archive)
Headers show
Series powerpc/perf: IMC trace-mode support | expand

Message

Anju T Sudhakar April 15, 2019, 10:11 a.m. UTC
IMC (In-Memory collection counters) is a hardware monitoring facility      
that collects large number of hardware performance events.                 
POWER9 support two modes for IMC which are the Accumulation mode and       
Trace mode. In Accumulation mode, event counts are accumulated in system   
Memory. Hypervisor then reads the posted counts periodically or when       
requested. In IMC Trace mode, the 64 bit trace scom value is initialized
with the event information. The CPMC*SEL and CPMC_LOAD in the trace scom, specifies
the event to be monitored and the sampling duration. On each overflow in the
CPMC*SEL, hardware snapshots the program counter along with event counts
and writes into memory pointed by LDBAR. LDBAR has bits to indicate whether
hardware is configured for accumulation or trace mode.
Currently the event monitored for trace-mode is fixed as cycle.     

Trace-IMC Implementation:                                                  
--------------------------                                                 
To enable trace-imc, we need to                                            
								    
* Add trace node in the DTS file for power9, so that the new trace node can
be discovered by the kernel.                                               
								    
Information included in the DTS file are as follows, (a snippet from      
the ima-catalog)                                                           
								    
TRACE_IMC: trace-events {                                                  
     #address-cells = <0x1>;                                        
     #size-cells = <0x1>;                                           
     event at 10200000 {                                               
	 event-name = "cycles" ;                                    
	 reg = <0x10200000 0x8>;                                    
	 desc = "Reference cycles" ;                                
     };                                                             
 };                                                                 
 trace@0 {                                                          
	 compatible = "ibm,imc-counters";                           
	 events-prefix = "trace_";                                  
	 reg = <0x0 0x8>;                                           
	 events = < &TRACE_IMC >;                                   
	 type = <0x2>;                                              
	 size = <0x40000>;                                          
 };                                                                 
								    
OP-BUILD changes needed to include the "trace node" is already pulled in   
to the ima-catalog repo.                                                   
								    
ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07
								    
* Enchance the opal_imc_counters_* calls to support this new trace mode    
in imc. Add support to initialize the trace-mode scom.                     
								    
TRACE_IMC_SCOM bit representation:                                         
								    
0:1     : SAMPSEL                                                          
2:33    : CPMC_LOAD                                                        
34:40   : CPMC1SEL                                                         
41:47   : CPMC2SEL                                                         
48:50   : BUFFERSIZE                                                       
51:63   : RESERVED                                                         
								    
CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines  
the event to count. BUFFRSIZE indicates the memory range. On each overflow,
hardware snapshots program counter along with event counts and update the 
memory and reloads the CMPC_LOAD value for the next sampling duration.     
IMC hardware does not support exceptions, so it quietly wraps around if    
memory buffer reaches the end.                                             

OPAL support for IMC trace mode is already upstream.

* Set LDBAR spr to enable imc-trace mode.                                  
                                                                             
  LDBAR Layout:                                                              
                                                                             
  0     : Enable/Disable                                                     
  1     : 0 -> Accumulation Mode                                             
          1 -> Trace Mode                                                    
  2:3   : Reserved                                                           
  4-6   : PB scope                                                           
  7     : Reserved                                                           
  8:50  : Counter Address                                                    
  51:63 : Reserved     

----------------------

PMI interrupt handling is avoided, since IMC trace mode snapshots the
program counter and update to the memory. And this also provide a way for
the operating system to do instruction sampling in real time without
PMI(Performance Monitoring Interrupts) processing overhead.          								    					    
Performance data using 'perf top' with and without trace-imc event:        
								    
PMI interrupts count when `perf top` command is executed without trace-imc event.
								    
# cat /proc/interrupts  (a snippet from the output)                        
9944      1072        804        804       1644        804       1306      
804        804        804        804        804        804        804      
804        804       1961       1602        804        804       1258      
[-----------------------------------------------------------------]        
803        803        803        803        803        803        803      
803        803        803        803        804        804        804     
804        804        804        804        804        804        803     
803        803        803        803        803       1306        803     
803   Performance monitoring interrupts                                   
								    
								    
`perf top` with trace-imc (executed right after 'perf top' without trace-imc event):
								    
# perf top -e trace_imc/trace_cycles/                                      
12.50%  [kernel]          [k] arch_cpu_idle                            
11.81%  [kernel]          [k] __next_timer_interrupt                   
11.22%  [kernel]          [k] rcu_idle_enter                           
10.25%  [kernel]          [k] find_next_bit                            
 7.91%  [kernel]          [k] do_idle                                  
 7.69%  [kernel]          [k] rcu_dynticks_eqs_exit                    
 5.20%  [kernel]          [k] tick_nohz_idle_stop_tick                 
     [-----------------------]                                      
								    
# cat /proc/interrupts (a snippet from the output)                         
								    
9944      1072        804        804       1644        804       1306      
804        804        804        804        804        804        804      
804        804       1961       1602        804        804       1258      
[-----------------------------------------------------------------]        
803        803        803        803        803        803        803      
803        803        803        804        804        804        804
804        804        804        804        804        804        803     
803        803        803        803        803       1306        803     
803   Performance monitoring interrupts                                   
								    
The PMI interrupts count remains the same.


Changelog:
----------
From v3 -> v4:

* trace_imc_refc is introduced. So that even if, core-imc
is disabled, trace-imc can be used.

* trace_imc_pmu_sched_task is removed and opal start/stop
is invoked in trace_imc_event_add/del function.

 
Suggestions/comments are welcome.

Anju T Sudhakar (4):
  powerpc/include: Add data structures and macros for IMC trace mode
  powerpc/perf: Rearrange setting of ldbar for thread-imc
  powerpc/perf: Trace imc events detection and cpuhotplug
  powerpc/perf: Trace imc PMU functions

Madhavan Srinivasan (1):
  powerpc/perf: Add privileged access check for thread_imc

 arch/powerpc/include/asm/imc-pmu.h        |  39 +++
 arch/powerpc/include/asm/opal-api.h       |   1 +
 arch/powerpc/perf/imc-pmu.c               | 318 +++++++++++++++++++++-
 arch/powerpc/platforms/powernv/opal-imc.c |   3 +
 include/linux/cpuhotplug.h                |   1 +
 5 files changed, 351 insertions(+), 11 deletions(-)

Comments

Anju T Sudhakar April 16, 2019, 9:44 a.m. UTC | #1
Hi,

Kindly ignore this series, since patch 5/5 in this series doesn't 
incorporate the event-format change

that I've done in v4 of this series.


Apologies for the inconvenience. I will post the updated v5 soon.


Thanks,

Anju

On 4/15/19 3:41 PM, Anju T Sudhakar wrote:
> IMC (In-Memory collection counters) is a hardware monitoring facility
> that collects large number of hardware performance events.
> POWER9 support two modes for IMC which are the Accumulation mode and
> Trace mode. In Accumulation mode, event counts are accumulated in system
> Memory. Hypervisor then reads the posted counts periodically or when
> requested. In IMC Trace mode, the 64 bit trace scom value is initialized
> with the event information. The CPMC*SEL and CPMC_LOAD in the trace scom, specifies
> the event to be monitored and the sampling duration. On each overflow in the
> CPMC*SEL, hardware snapshots the program counter along with event counts
> and writes into memory pointed by LDBAR. LDBAR has bits to indicate whether
> hardware is configured for accumulation or trace mode.
> Currently the event monitored for trace-mode is fixed as cycle.
>
> Trace-IMC Implementation:
> --------------------------
> To enable trace-imc, we need to
> 								
> * Add trace node in the DTS file for power9, so that the new trace node can
> be discovered by the kernel.
> 								
> Information included in the DTS file are as follows, (a snippet from
> the ima-catalog)
> 								
> TRACE_IMC: trace-events {
>       #address-cells = <0x1>;
>       #size-cells = <0x1>;
>       event at 10200000 {
> 	 event-name = "cycles" ;
> 	 reg = <0x10200000 0x8>;
> 	 desc = "Reference cycles" ;
>       };
>   };
>   trace@0 {
> 	 compatible = "ibm,imc-counters";
> 	 events-prefix = "trace_";
> 	 reg = <0x0 0x8>;
> 	 events = < &TRACE_IMC >;
> 	 type = <0x2>;
> 	 size = <0x40000>;
>   };
> 								
> OP-BUILD changes needed to include the "trace node" is already pulled in
> to the ima-catalog repo.
> 								
> ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07
> 								
> * Enchance the opal_imc_counters_* calls to support this new trace mode
> in imc. Add support to initialize the trace-mode scom.
> 								
> TRACE_IMC_SCOM bit representation:
> 								
> 0:1     : SAMPSEL
> 2:33    : CPMC_LOAD
> 34:40   : CPMC1SEL
> 41:47   : CPMC2SEL
> 48:50   : BUFFERSIZE
> 51:63   : RESERVED
> 								
> CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines
> the event to count. BUFFRSIZE indicates the memory range. On each overflow,
> hardware snapshots program counter along with event counts and update the
> memory and reloads the CMPC_LOAD value for the next sampling duration.
> IMC hardware does not support exceptions, so it quietly wraps around if
> memory buffer reaches the end.
>
> OPAL support for IMC trace mode is already upstream.
>
> * Set LDBAR spr to enable imc-trace mode.
>                                                                               
>    LDBAR Layout:
>                                                                               
>    0     : Enable/Disable
>    1     : 0 -> Accumulation Mode
>            1 -> Trace Mode
>    2:3   : Reserved
>    4-6   : PB scope
>    7     : Reserved
>    8:50  : Counter Address
>    51:63 : Reserved
>
> ----------------------
>
> PMI interrupt handling is avoided, since IMC trace mode snapshots the
> program counter and update to the memory. And this also provide a way for
> the operating system to do instruction sampling in real time without
> PMI(Performance Monitoring Interrupts) processing overhead.          								    					
> Performance data using 'perf top' with and without trace-imc event:
> 								
> PMI interrupts count when `perf top` command is executed without trace-imc event.
> 								
> # cat /proc/interrupts  (a snippet from the output)
> 9944      1072        804        804       1644        804       1306
> 804        804        804        804        804        804        804
> 804        804       1961       1602        804        804       1258
> [-----------------------------------------------------------------]
> 803        803        803        803        803        803        803
> 803        803        803        803        804        804        804
> 804        804        804        804        804        804        803
> 803        803        803        803        803       1306        803
> 803   Performance monitoring interrupts
> 								
> 								
> `perf top` with trace-imc (executed right after 'perf top' without trace-imc event):
> 								
> # perf top -e trace_imc/trace_cycles/
> 12.50%  [kernel]          [k] arch_cpu_idle
> 11.81%  [kernel]          [k] __next_timer_interrupt
> 11.22%  [kernel]          [k] rcu_idle_enter
> 10.25%  [kernel]          [k] find_next_bit
>   7.91%  [kernel]          [k] do_idle
>   7.69%  [kernel]          [k] rcu_dynticks_eqs_exit
>   5.20%  [kernel]          [k] tick_nohz_idle_stop_tick
>       [-----------------------]
> 								
> # cat /proc/interrupts (a snippet from the output)
> 								
> 9944      1072        804        804       1644        804       1306
> 804        804        804        804        804        804        804
> 804        804       1961       1602        804        804       1258
> [-----------------------------------------------------------------]
> 803        803        803        803        803        803        803
> 803        803        803        804        804        804        804
> 804        804        804        804        804        804        803
> 803        803        803        803        803       1306        803
> 803   Performance monitoring interrupts
> 								
> The PMI interrupts count remains the same.
>
>
> Changelog:
> ----------
>  From v3 -> v4:
>
> * trace_imc_refc is introduced. So that even if, core-imc
> is disabled, trace-imc can be used.
>
> * trace_imc_pmu_sched_task is removed and opal start/stop
> is invoked in trace_imc_event_add/del function.
>
>
> Suggestions/comments are welcome.
>
> Anju T Sudhakar (4):
>    powerpc/include: Add data structures and macros for IMC trace mode
>    powerpc/perf: Rearrange setting of ldbar for thread-imc
>    powerpc/perf: Trace imc events detection and cpuhotplug
>    powerpc/perf: Trace imc PMU functions
>
> Madhavan Srinivasan (1):
>    powerpc/perf: Add privileged access check for thread_imc
>
>   arch/powerpc/include/asm/imc-pmu.h        |  39 +++
>   arch/powerpc/include/asm/opal-api.h       |   1 +
>   arch/powerpc/perf/imc-pmu.c               | 318 +++++++++++++++++++++-
>   arch/powerpc/platforms/powernv/opal-imc.c |   3 +
>   include/linux/cpuhotplug.h                |   1 +
>   5 files changed, 351 insertions(+), 11 deletions(-)
>
Anju T Sudhakar April 16, 2019, 10:07 a.m. UTC | #2
On 4/16/19 3:14 PM, Anju T Sudhakar wrote:
> Hi,
>
> Kindly ignore this series, since patch 5/5 in this series doesn't 
> incorporate the event-format change
>
> that I've done in v4 of this series.
>
>
> Apologies for the inconvenience. I will post the updated v5 soon.
>
>
s/v5/v4


> Thanks,
>
> Anju
>
> On 4/15/19 3:41 PM, Anju T Sudhakar wrote:
>> IMC (In-Memory collection counters) is a hardware monitoring facility
>> that collects large number of hardware performance events.
>> POWER9 support two modes for IMC which are the Accumulation mode and
>> Trace mode. In Accumulation mode, event counts are accumulated in system
>> Memory. Hypervisor then reads the posted counts periodically or when
>> requested. In IMC Trace mode, the 64 bit trace scom value is initialized
>> with the event information. The CPMC*SEL and CPMC_LOAD in the trace 
>> scom, specifies
>> the event to be monitored and the sampling duration. On each overflow 
>> in the
>> CPMC*SEL, hardware snapshots the program counter along with event counts
>> and writes into memory pointed by LDBAR. LDBAR has bits to indicate 
>> whether
>> hardware is configured for accumulation or trace mode.
>> Currently the event monitored for trace-mode is fixed as cycle.
>>
>> Trace-IMC Implementation:
>> --------------------------
>> To enable trace-imc, we need to
>>
>> * Add trace node in the DTS file for power9, so that the new trace 
>> node can
>> be discovered by the kernel.
>>
>> Information included in the DTS file are as follows, (a snippet from
>> the ima-catalog)
>>
>> TRACE_IMC: trace-events {
>>       #address-cells = <0x1>;
>>       #size-cells = <0x1>;
>>       event at 10200000 {
>>      event-name = "cycles" ;
>>      reg = <0x10200000 0x8>;
>>      desc = "Reference cycles" ;
>>       };
>>   };
>>   trace@0 {
>>      compatible = "ibm,imc-counters";
>>      events-prefix = "trace_";
>>      reg = <0x0 0x8>;
>>      events = < &TRACE_IMC >;
>>      type = <0x2>;
>>      size = <0x40000>;
>>   };
>>
>> OP-BUILD changes needed to include the "trace node" is already pulled in
>> to the ima-catalog repo.
>>
>> ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07 
>>
>>
>> * Enchance the opal_imc_counters_* calls to support this new trace mode
>> in imc. Add support to initialize the trace-mode scom.
>>
>> TRACE_IMC_SCOM bit representation:
>>
>> 0:1     : SAMPSEL
>> 2:33    : CPMC_LOAD
>> 34:40   : CPMC1SEL
>> 41:47   : CPMC2SEL
>> 48:50   : BUFFERSIZE
>> 51:63   : RESERVED
>>
>> CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL 
>> determines
>> the event to count. BUFFRSIZE indicates the memory range. On each 
>> overflow,
>> hardware snapshots program counter along with event counts and update 
>> the
>> memory and reloads the CMPC_LOAD value for the next sampling duration.
>> IMC hardware does not support exceptions, so it quietly wraps around if
>> memory buffer reaches the end.
>>
>> OPAL support for IMC trace mode is already upstream.
>>
>> * Set LDBAR spr to enable imc-trace mode.
>>    LDBAR Layout:
>>    0     : Enable/Disable
>>    1     : 0 -> Accumulation Mode
>>            1 -> Trace Mode
>>    2:3   : Reserved
>>    4-6   : PB scope
>>    7     : Reserved
>>    8:50  : Counter Address
>>    51:63 : Reserved
>>
>> ----------------------
>>
>> PMI interrupt handling is avoided, since IMC trace mode snapshots the
>> program counter and update to the memory. And this also provide a way 
>> for
>> the operating system to do instruction sampling in real time without
>> PMI(Performance Monitoring Interrupts) processing overhead.
>> Performance data using 'perf top' with and without trace-imc event:
>>
>> PMI interrupts count when `perf top` command is executed without 
>> trace-imc event.
>>
>> # cat /proc/interrupts  (a snippet from the output)
>> 9944      1072        804        804       1644        804 1306
>> 804        804        804        804        804 804        804
>> 804        804       1961       1602        804        804 1258
>> [-----------------------------------------------------------------]
>> 803        803        803        803        803 803        803
>> 803        803        803        803        804 804        804
>> 804        804        804        804        804 804        803
>> 803        803        803        803        803 1306        803
>> 803   Performance monitoring interrupts
>>
>>
>> `perf top` with trace-imc (executed right after 'perf top' without 
>> trace-imc event):
>>
>> # perf top -e trace_imc/trace_cycles/
>> 12.50%  [kernel]          [k] arch_cpu_idle
>> 11.81%  [kernel]          [k] __next_timer_interrupt
>> 11.22%  [kernel]          [k] rcu_idle_enter
>> 10.25%  [kernel]          [k] find_next_bit
>>   7.91%  [kernel]          [k] do_idle
>>   7.69%  [kernel]          [k] rcu_dynticks_eqs_exit
>>   5.20%  [kernel]          [k] tick_nohz_idle_stop_tick
>>       [-----------------------]
>>
>> # cat /proc/interrupts (a snippet from the output)
>>
>> 9944      1072        804        804       1644        804 1306
>> 804        804        804        804        804 804        804
>> 804        804       1961       1602        804        804 1258
>> [-----------------------------------------------------------------]
>> 803        803        803        803        803 803        803
>> 803        803        803        804        804 804        804
>> 804        804        804        804        804 804        803
>> 803        803        803        803        803 1306        803
>> 803   Performance monitoring interrupts
>>
>> The PMI interrupts count remains the same.
>>
>>
>> Changelog:
>> ----------
>>  From v3 -> v4:
>>
>> * trace_imc_refc is introduced. So that even if, core-imc
>> is disabled, trace-imc can be used.
>>
>> * trace_imc_pmu_sched_task is removed and opal start/stop
>> is invoked in trace_imc_event_add/del function.
>>
>>
>> Suggestions/comments are welcome.
>>
>> Anju T Sudhakar (4):
>>    powerpc/include: Add data structures and macros for IMC trace mode
>>    powerpc/perf: Rearrange setting of ldbar for thread-imc
>>    powerpc/perf: Trace imc events detection and cpuhotplug
>>    powerpc/perf: Trace imc PMU functions
>>
>> Madhavan Srinivasan (1):
>>    powerpc/perf: Add privileged access check for thread_imc
>>
>>   arch/powerpc/include/asm/imc-pmu.h        |  39 +++
>>   arch/powerpc/include/asm/opal-api.h       |   1 +
>>   arch/powerpc/perf/imc-pmu.c               | 318 +++++++++++++++++++++-
>>   arch/powerpc/platforms/powernv/opal-imc.c |   3 +
>>   include/linux/cpuhotplug.h                |   1 +
>>   5 files changed, 351 insertions(+), 11 deletions(-)
>>