diff mbox series

perf vendor events power10: Update JSON/events

Message ID 20240723052154.96202-1-kjain@linux.ibm.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series perf vendor events power10: Update JSON/events | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_perf fail perf (ubuntu-18.04, ppc64) failed at step Build.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 5 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 21 jobs.

Commit Message

kajoljain July 23, 2024, 5:21 a.m. UTC
Update JSON/events for power10 platform with additional events.
Also move PM_VECTOR_LD_CMPL event from others.json to
frontend.json file.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 .../arch/powerpc/power10/frontend.json        |   5 +
 .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
 2 files changed, 100 insertions(+), 5 deletions(-)

Comments

Disha Goel July 23, 2024, 7:05 a.m. UTC | #1
On 23/07/24 10:51 am, Kajol Jain wrote:

> Update JSON/events for power10 platform with additional events.
> Also move PM_VECTOR_LD_CMPL event from others.json to
> frontend.json file.
>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

I have tested the patch on power10 machine. Looks good to me.

Tested-by: Disha Goel <disgoel@linux.ibm.com>

> ---
>   .../arch/powerpc/power10/frontend.json        |   5 +
>   .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
>   2 files changed, 100 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> index 5977f5e64212..53660c279286 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> @@ -74,6 +74,11 @@
>       "EventName": "PM_ISSUE_KILL",
>       "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
>     },
> +  {
> +    "EventCode": "0x44054",
> +    "EventName": "PM_VECTOR_LD_CMPL",
> +    "BriefDescription": "Vector load instruction completed."
> +  },
>     {
>       "EventCode": "0x44056",
>       "EventName": "PM_VECTOR_ST_CMPL",
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> index fcf8a8ebe7bd..53ca610152fa 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> @@ -94,11 +94,6 @@
>       "EventName": "PM_L1_ICACHE_RELOADED_ALL",
>       "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
>     },
> -  {
> -    "EventCode": "0x44054",
> -    "EventName": "PM_VECTOR_LD_CMPL",
> -    "BriefDescription": "Vector load instruction completed."
> -  },
>     {
>       "EventCode": "0x4D05E",
>       "EventName": "PM_BR_CMPL",
> @@ -108,5 +103,100 @@
>       "EventCode": "0x400F0",
>       "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
>       "BriefDescription": "Load missed L1, counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x00000038BC",
> +    "EventName": "PM_ISYNC_CMPL",
> +    "BriefDescription": "Isync completion count per thread."
> +  },
> +  {
> +    "EventCode": "0x000000C088",
> +    "EventName": "PM_LD0_32B_FIN",
> +    "BriefDescription": "256-bit load finished in the LD0 load execution unit."
> +  },
> +  {
> +    "EventCode": "0x000000C888",
> +    "EventName": "PM_LD1_32B_FIN",
> +    "BriefDescription": "256-bit load finished in the LD1 load execution unit."
> +  },
> +  {
> +    "EventCode": "0x000000C090",
> +    "EventName": "PM_LD0_UNALIGNED_FIN",
> +    "BriefDescription": "Load instructions in LD0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C890",
> +    "EventName": "PM_LD1_UNALIGNED_FIN",
> +    "BriefDescription": "Load instructions in LD1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C0A4",
> +    "EventName": "PM_ST0_UNALIGNED_FIN",
> +    "BriefDescription": "Store instructions in ST0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C8A4",
> +    "EventName": "PM_ST1_UNALIGNED_FIN",
> +    "BriefDescription": "Store instructions in ST1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C8B8",
> +    "EventName": "PM_STCX_SUCCESS_CMPL",
> +    "BriefDescription": "STCX instructions that completed successfully. Specifically, counts only when a pass status is returned from the nest."
> +  },
> +  {
> +    "EventCode": "0x000000D0B4",
> +    "EventName": "PM_DC_PREF_STRIDED_CONF",
> +    "BriefDescription": "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software."
> +  },
> +  {
> +    "EventCode": "0x000000F880",
> +    "EventName": "PM_SNOOP_TLBIE_CYC",
> +    "BriefDescription": "Cycles in which TLBIE snoops are executed in the LSU."
> +  },
> +  {
> +    "EventCode": "0x000000F084",
> +    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which the data cache is being walked."
> +  },
> +  {
> +    "EventCode": "0x000000F884",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which older stores are still draining."
> +  },
> +  {
> +    "EventCode": "0x000000F088",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which older loads are still draining."
> +  },
> +  {
> +    "EventCode": "0x000000F08C",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit is waiting for the MMU to finish invalidation."
> +  },
> +  {
> +    "EventCode": "0x0000004884",
> +    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
> +    "BriefDescription": "Cycles in which no instructions are fetched because there is no room in the instruction buffers."
> +  },
> +  {
> +    "EventCode": "0x00000048B4",
> +    "EventName": "PM_BR_TKN_UNCOND_FIN",
> +    "BriefDescription": "An unconditional branch finished. All unconditional branches are taken."
> +  },
> +  {
> +    "EventCode": "0x0B0000016080",
> +    "EventName": "PM_L2_TLBIE_SLBIE_START",
> +    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG operation from the core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> +  },
> +  {
> +    "EventCode": "0x0B0000016880",
> +    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was held in a hottemp condition by the NCU Master. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> +  },
> +  {
> +    "EventCode": "0x0B0000026880",
> +    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>     }
>   ]
Ian Rogers July 23, 2024, 4:02 p.m. UTC | #2
On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
>
> Update JSON/events for power10 platform with additional events.
> Also move PM_VECTOR_LD_CMPL event from others.json to
> frontend.json file.
>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

Reviewed-by: Ian Rogers <irogers@google.com>

> ---
>  .../arch/powerpc/power10/frontend.json        |   5 +
>  .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
>  2 files changed, 100 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> index 5977f5e64212..53660c279286 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> @@ -74,6 +74,11 @@
>      "EventName": "PM_ISSUE_KILL",
>      "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
>    },
> +  {
> +    "EventCode": "0x44054",
> +    "EventName": "PM_VECTOR_LD_CMPL",
> +    "BriefDescription": "Vector load instruction completed."
> +  },
>    {
>      "EventCode": "0x44056",
>      "EventName": "PM_VECTOR_ST_CMPL",
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> index fcf8a8ebe7bd..53ca610152fa 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json

The "topic" of an event is taken from the filename, here the topic
will be "others".

> @@ -94,11 +94,6 @@
>      "EventName": "PM_L1_ICACHE_RELOADED_ALL",
>      "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
>    },
> -  {
> -    "EventCode": "0x44054",
> -    "EventName": "PM_VECTOR_LD_CMPL",
> -    "BriefDescription": "Vector load instruction completed."
> -  },
>    {
>      "EventCode": "0x4D05E",
>      "EventName": "PM_BR_CMPL",
> @@ -108,5 +103,100 @@
>      "EventCode": "0x400F0",
>      "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
>      "BriefDescription": "Load missed L1, counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x00000038BC",
> +    "EventName": "PM_ISYNC_CMPL",
> +    "BriefDescription": "Isync completion count per thread."
> +  },
> +  {
> +    "EventCode": "0x000000C088",
> +    "EventName": "PM_LD0_32B_FIN",
> +    "BriefDescription": "256-bit load finished in the LD0 load execution unit."
> +  },
> +  {
> +    "EventCode": "0x000000C888",
> +    "EventName": "PM_LD1_32B_FIN",
> +    "BriefDescription": "256-bit load finished in the LD1 load execution unit."
> +  },
> +  {
> +    "EventCode": "0x000000C090",
> +    "EventName": "PM_LD0_UNALIGNED_FIN",
> +    "BriefDescription": "Load instructions in LD0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C890",
> +    "EventName": "PM_LD1_UNALIGNED_FIN",
> +    "BriefDescription": "Load instructions in LD1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C0A4",
> +    "EventName": "PM_ST0_UNALIGNED_FIN",
> +    "BriefDescription": "Store instructions in ST0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C8A4",
> +    "EventName": "PM_ST1_UNALIGNED_FIN",
> +    "BriefDescription": "Store instructions in ST1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> +  },
> +  {
> +    "EventCode": "0x000000C8B8",
> +    "EventName": "PM_STCX_SUCCESS_CMPL",
> +    "BriefDescription": "STCX instructions that completed successfully. Specifically, counts only when a pass status is returned from the nest."
> +  },
> +  {
> +    "EventCode": "0x000000D0B4",
> +    "EventName": "PM_DC_PREF_STRIDED_CONF",
> +    "BriefDescription": "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software."
> +  },
> +  {
> +    "EventCode": "0x000000F880",
> +    "EventName": "PM_SNOOP_TLBIE_CYC",
> +    "BriefDescription": "Cycles in which TLBIE snoops are executed in the LSU."
> +  },

Perhaps the topics here should be memory or translation?

> +  {
> +    "EventCode": "0x000000F084",
> +    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which the data cache is being walked."
> +  },
> +  {
> +    "EventCode": "0x000000F884",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which older stores are still draining."
> +  },
> +  {
> +    "EventCode": "0x000000F088",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which older loads are still draining."
> +  },
> +  {
> +    "EventCode": "0x000000F08C",
> +    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
> +    "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit is waiting for the MMU to finish invalidation."
> +  },
> +  {
> +    "EventCode": "0x0000004884",
> +    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
> +    "BriefDescription": "Cycles in which no instructions are fetched because there is no room in the instruction buffers."
> +  },
> +  {
> +    "EventCode": "0x00000048B4",
> +    "EventName": "PM_BR_TKN_UNCOND_FIN",
> +    "BriefDescription": "An unconditional branch finished. All unconditional branches are taken."

I see PM_BR_TAKEN_CMPL in
tools/perf/pmu-events/arch/powerpc/power10/frontend.json, so maybe it
makes sense to put this event in that topic?

Thanks,
Ian

> +  },
> +  {
> +    "EventCode": "0x0B0000016080",
> +    "EventName": "PM_L2_TLBIE_SLBIE_START",
> +    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG operation from the core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> +  },
> +  {
> +    "EventCode": "0x0B0000016880",
> +    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was held in a hottemp condition by the NCU Master. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> +  },
> +  {
> +    "EventCode": "0x0B0000026880",
> +    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>    }
>  ]
> --
> 2.43.0
>
Arnaldo Carvalho de Melo July 26, 2024, 2:08 p.m. UTC | #3
On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote:
> On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
> >
> > Update JSON/events for power10 platform with additional events.
> > Also move PM_VECTOR_LD_CMPL event from others.json to
> > frontend.json file.
> >
> > Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> 
> Reviewed-by: Ian Rogers <irogers@google.com>

Thanks, applied to tmp.perf-tools-next,

- Arnaldo
 
> > ---
> >  .../arch/powerpc/power10/frontend.json        |   5 +
> >  .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
> >  2 files changed, 100 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> > index 5977f5e64212..53660c279286 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> > +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> > @@ -74,6 +74,11 @@
> >      "EventName": "PM_ISSUE_KILL",
> >      "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
> >    },
> > +  {
> > +    "EventCode": "0x44054",
> > +    "EventName": "PM_VECTOR_LD_CMPL",
> > +    "BriefDescription": "Vector load instruction completed."
> > +  },
> >    {
> >      "EventCode": "0x44056",
> >      "EventName": "PM_VECTOR_ST_CMPL",
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> > index fcf8a8ebe7bd..53ca610152fa 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
> > +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> 
> The "topic" of an event is taken from the filename, here the topic
> will be "others".
> 
> > @@ -94,11 +94,6 @@
> >      "EventName": "PM_L1_ICACHE_RELOADED_ALL",
> >      "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
> >    },
> > -  {
> > -    "EventCode": "0x44054",
> > -    "EventName": "PM_VECTOR_LD_CMPL",
> > -    "BriefDescription": "Vector load instruction completed."
> > -  },
> >    {
> >      "EventCode": "0x4D05E",
> >      "EventName": "PM_BR_CMPL",
> > @@ -108,5 +103,100 @@
> >      "EventCode": "0x400F0",
> >      "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
> >      "BriefDescription": "Load missed L1, counted at finish time."
> > +  },
> > +  {
> > +    "EventCode": "0x00000038BC",
> > +    "EventName": "PM_ISYNC_CMPL",
> > +    "BriefDescription": "Isync completion count per thread."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C088",
> > +    "EventName": "PM_LD0_32B_FIN",
> > +    "BriefDescription": "256-bit load finished in the LD0 load execution unit."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C888",
> > +    "EventName": "PM_LD1_32B_FIN",
> > +    "BriefDescription": "256-bit load finished in the LD1 load execution unit."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C090",
> > +    "EventName": "PM_LD0_UNALIGNED_FIN",
> > +    "BriefDescription": "Load instructions in LD0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C890",
> > +    "EventName": "PM_LD1_UNALIGNED_FIN",
> > +    "BriefDescription": "Load instructions in LD1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C0A4",
> > +    "EventName": "PM_ST0_UNALIGNED_FIN",
> > +    "BriefDescription": "Store instructions in ST0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C8A4",
> > +    "EventName": "PM_ST1_UNALIGNED_FIN",
> > +    "BriefDescription": "Store instructions in ST1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
> > +  },
> > +  {
> > +    "EventCode": "0x000000C8B8",
> > +    "EventName": "PM_STCX_SUCCESS_CMPL",
> > +    "BriefDescription": "STCX instructions that completed successfully. Specifically, counts only when a pass status is returned from the nest."
> > +  },
> > +  {
> > +    "EventCode": "0x000000D0B4",
> > +    "EventName": "PM_DC_PREF_STRIDED_CONF",
> > +    "BriefDescription": "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software."
> > +  },
> > +  {
> > +    "EventCode": "0x000000F880",
> > +    "EventName": "PM_SNOOP_TLBIE_CYC",
> > +    "BriefDescription": "Cycles in which TLBIE snoops are executed in the LSU."
> > +  },
> 
> Perhaps the topics here should be memory or translation?
> 
> > +  {
> > +    "EventCode": "0x000000F084",
> > +    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
> > +    "BriefDescription": "TLBIE snoop cycles in which the data cache is being walked."
> > +  },
> > +  {
> > +    "EventCode": "0x000000F884",
> > +    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
> > +    "BriefDescription": "TLBIE snoop cycles in which older stores are still draining."
> > +  },
> > +  {
> > +    "EventCode": "0x000000F088",
> > +    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
> > +    "BriefDescription": "TLBIE snoop cycles in which older loads are still draining."
> > +  },
> > +  {
> > +    "EventCode": "0x000000F08C",
> > +    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
> > +    "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit is waiting for the MMU to finish invalidation."
> > +  },
> > +  {
> > +    "EventCode": "0x0000004884",
> > +    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
> > +    "BriefDescription": "Cycles in which no instructions are fetched because there is no room in the instruction buffers."
> > +  },
> > +  {
> > +    "EventCode": "0x00000048B4",
> > +    "EventName": "PM_BR_TKN_UNCOND_FIN",
> > +    "BriefDescription": "An unconditional branch finished. All unconditional branches are taken."
> 
> I see PM_BR_TAKEN_CMPL in
> tools/perf/pmu-events/arch/powerpc/power10/frontend.json, so maybe it
> makes sense to put this event in that topic?
> 
> Thanks,
> Ian
> 
> > +  },
> > +  {
> > +    "EventCode": "0x0B0000016080",
> > +    "EventName": "PM_L2_TLBIE_SLBIE_START",
> > +    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG operation from the core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> > +  },
> > +  {
> > +    "EventCode": "0x0B0000016880",
> > +    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
> > +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was held in a hottemp condition by the NCU Master. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> > +  },
> > +  {
> > +    "EventCode": "0x0B0000026880",
> > +    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
> > +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> >    }
> >  ]
> > --
> > 2.43.0
> >
kajoljain July 30, 2024, 1:53 p.m. UTC | #4
On 7/23/24 21:32, Ian Rogers wrote:
> On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
>>
>> Update JSON/events for power10 platform with additional events.
>> Also move PM_VECTOR_LD_CMPL event from others.json to
>> frontend.json file.
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> 
> Reviewed-by: Ian Rogers <irogers@google.com>

Hi Ian,
  Thanks for reviewing the patch and for all the suggestions. These json
files generated as per internal format defined, I will send a follow-up
patch, if we can move these events.

Thanks,
Kajol Jain

> 
>> ---
>>  .../arch/powerpc/power10/frontend.json        |   5 +
>>  .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
>>  2 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> index 5977f5e64212..53660c279286 100644
>> --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> @@ -74,6 +74,11 @@
>>      "EventName": "PM_ISSUE_KILL",
>>      "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
>>    },
>> +  {
>> +    "EventCode": "0x44054",
>> +    "EventName": "PM_VECTOR_LD_CMPL",
>> +    "BriefDescription": "Vector load instruction completed."
>> +  },
>>    {
>>      "EventCode": "0x44056",
>>      "EventName": "PM_VECTOR_ST_CMPL",
>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> index fcf8a8ebe7bd..53ca610152fa 100644
>> --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> 
> The "topic" of an event is taken from the filename, here the topic
> will be "others".
> 
>> @@ -94,11 +94,6 @@
>>      "EventName": "PM_L1_ICACHE_RELOADED_ALL",
>>      "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
>>    },
>> -  {
>> -    "EventCode": "0x44054",
>> -    "EventName": "PM_VECTOR_LD_CMPL",
>> -    "BriefDescription": "Vector load instruction completed."
>> -  },
>>    {
>>      "EventCode": "0x4D05E",
>>      "EventName": "PM_BR_CMPL",
>> @@ -108,5 +103,100 @@
>>      "EventCode": "0x400F0",
>>      "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
>>      "BriefDescription": "Load missed L1, counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x00000038BC",
>> +    "EventName": "PM_ISYNC_CMPL",
>> +    "BriefDescription": "Isync completion count per thread."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C088",
>> +    "EventName": "PM_LD0_32B_FIN",
>> +    "BriefDescription": "256-bit load finished in the LD0 load execution unit."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C888",
>> +    "EventName": "PM_LD1_32B_FIN",
>> +    "BriefDescription": "256-bit load finished in the LD1 load execution unit."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C090",
>> +    "EventName": "PM_LD0_UNALIGNED_FIN",
>> +    "BriefDescription": "Load instructions in LD0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C890",
>> +    "EventName": "PM_LD1_UNALIGNED_FIN",
>> +    "BriefDescription": "Load instructions in LD1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C0A4",
>> +    "EventName": "PM_ST0_UNALIGNED_FIN",
>> +    "BriefDescription": "Store instructions in ST0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C8A4",
>> +    "EventName": "PM_ST1_UNALIGNED_FIN",
>> +    "BriefDescription": "Store instructions in ST1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C8B8",
>> +    "EventName": "PM_STCX_SUCCESS_CMPL",
>> +    "BriefDescription": "STCX instructions that completed successfully. Specifically, counts only when a pass status is returned from the nest."
>> +  },
>> +  {
>> +    "EventCode": "0x000000D0B4",
>> +    "EventName": "PM_DC_PREF_STRIDED_CONF",
>> +    "BriefDescription": "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F880",
>> +    "EventName": "PM_SNOOP_TLBIE_CYC",
>> +    "BriefDescription": "Cycles in which TLBIE snoops are executed in the LSU."
>> +  },
> 
> Perhaps the topics here should be memory or translation?
> 
>> +  {
>> +    "EventCode": "0x000000F084",
>> +    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which the data cache is being walked."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F884",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which older stores are still draining."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F088",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which older loads are still draining."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F08C",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit is waiting for the MMU to finish invalidation."
>> +  },
>> +  {
>> +    "EventCode": "0x0000004884",
>> +    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
>> +    "BriefDescription": "Cycles in which no instructions are fetched because there is no room in the instruction buffers."
>> +  },
>> +  {
>> +    "EventCode": "0x00000048B4",
>> +    "EventName": "PM_BR_TKN_UNCOND_FIN",
>> +    "BriefDescription": "An unconditional branch finished. All unconditional branches are taken."
> 
> I see PM_BR_TAKEN_CMPL in
> tools/perf/pmu-events/arch/powerpc/power10/frontend.json, so maybe it
> makes sense to put this event in that topic?
> 
> Thanks,
> Ian
> 
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000016080",
>> +    "EventName": "PM_L2_TLBIE_SLBIE_START",
>> +    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG operation from the core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000016880",
>> +    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
>> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was held in a hottemp condition by the NCU Master. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000026880",
>> +    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
>> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>>    }
>>  ]
>> --
>> 2.43.0
>>
kajoljain July 30, 2024, 1:53 p.m. UTC | #5
On 7/23/24 12:35, Disha Goel wrote:
> On 23/07/24 10:51 am, Kajol Jain wrote:
> 
>> Update JSON/events for power10 platform with additional events.
>> Also move PM_VECTOR_LD_CMPL event from others.json to
>> frontend.json file.
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> 
> I have tested the patch on power10 machine. Looks good to me.
> 

Hi Disha,
   Thanks for testing this patch.

Thanks,
Kajol Jain

> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> 
>> ---
>>   .../arch/powerpc/power10/frontend.json        |   5 +
>>   .../arch/powerpc/power10/others.json          | 100 +++++++++++++++++-
>>   2 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> index 5977f5e64212..53660c279286 100644
>> --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>> @@ -74,6 +74,11 @@
>>       "EventName": "PM_ISSUE_KILL",
>>       "BriefDescription": "Cycles in which an instruction or group of
>> instructions were cancelled after being issued. This event increments
>> once per occurrence, regardless of how many instructions are included
>> in the issue group."
>>     },
>> +  {
>> +    "EventCode": "0x44054",
>> +    "EventName": "PM_VECTOR_LD_CMPL",
>> +    "BriefDescription": "Vector load instruction completed."
>> +  },
>>     {
>>       "EventCode": "0x44056",
>>       "EventName": "PM_VECTOR_ST_CMPL",
>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> b/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> index fcf8a8ebe7bd..53ca610152fa 100644
>> --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
>> @@ -94,11 +94,6 @@
>>       "EventName": "PM_L1_ICACHE_RELOADED_ALL",
>>       "BriefDescription": "Counts all instruction cache reloads
>> includes demand, prefetch, prefetch turned into demand and demand
>> turned into prefetch."
>>     },
>> -  {
>> -    "EventCode": "0x44054",
>> -    "EventName": "PM_VECTOR_LD_CMPL",
>> -    "BriefDescription": "Vector load instruction completed."
>> -  },
>>     {
>>       "EventCode": "0x4D05E",
>>       "EventName": "PM_BR_CMPL",
>> @@ -108,5 +103,100 @@
>>       "EventCode": "0x400F0",
>>       "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
>>       "BriefDescription": "Load missed L1, counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x00000038BC",
>> +    "EventName": "PM_ISYNC_CMPL",
>> +    "BriefDescription": "Isync completion count per thread."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C088",
>> +    "EventName": "PM_LD0_32B_FIN",
>> +    "BriefDescription": "256-bit load finished in the LD0 load
>> execution unit."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C888",
>> +    "EventName": "PM_LD1_32B_FIN",
>> +    "BriefDescription": "256-bit load finished in the LD1 load
>> execution unit."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C090",
>> +    "EventName": "PM_LD0_UNALIGNED_FIN",
>> +    "BriefDescription": "Load instructions in LD0 port that are
>> either unaligned, or treated as unaligned and require an additional
>> recycle through the pipeline using the load gather buffer. This
>> typically adds about 10 cycles to the latency of the instruction. This
>> includes loads that cross the 128 byte boundary, octword loads that
>> are not aligned, and a special forward progress case of a load that
>> does not hit in the L1 and crosses the 32 byte boundary and is
>> launched NTC. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C890",
>> +    "EventName": "PM_LD1_UNALIGNED_FIN",
>> +    "BriefDescription": "Load instructions in LD1 port that are
>> either unaligned, or treated as unaligned and require an additional
>> recycle through the pipeline using the load gather buffer. This
>> typically adds about 10 cycles to the latency of the instruction. This
>> includes loads that cross the 128 byte boundary, octword loads that
>> are not aligned, and a special forward progress case of a load that
>> does not hit in the L1 and crosses the 32 byte boundary and is
>> launched NTC. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C0A4",
>> +    "EventName": "PM_ST0_UNALIGNED_FIN",
>> +    "BriefDescription": "Store instructions in ST0 port that are
>> either unaligned, or treated as unaligned and require an additional
>> recycle through the pipeline. This typically adds about 10 cycles to
>> the latency of the instruction. This only includes stores that cross
>> the 128 byte boundary. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C8A4",
>> +    "EventName": "PM_ST1_UNALIGNED_FIN",
>> +    "BriefDescription": "Store instructions in ST1 port that are
>> either unaligned, or treated as unaligned and require an additional
>> recycle through the pipeline. This typically adds about 10 cycles to
>> the latency of the instruction. This only includes stores that cross
>> the 128 byte boundary. Counted at finish time."
>> +  },
>> +  {
>> +    "EventCode": "0x000000C8B8",
>> +    "EventName": "PM_STCX_SUCCESS_CMPL",
>> +    "BriefDescription": "STCX instructions that completed
>> successfully. Specifically, counts only when a pass status is returned
>> from the nest."
>> +  },
>> +  {
>> +    "EventCode": "0x000000D0B4",
>> +    "EventName": "PM_DC_PREF_STRIDED_CONF",
>> +    "BriefDescription": "A demand load referenced a line in an active
>> strided prefetch stream. The stream could have been allocated through
>> the hardware prefetch mechanism or through software."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F880",
>> +    "EventName": "PM_SNOOP_TLBIE_CYC",
>> +    "BriefDescription": "Cycles in which TLBIE snoops are executed in
>> the LSU."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F084",
>> +    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which the data cache
>> is being walked."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F884",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which older stores are
>> still draining."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F088",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which older loads are
>> still draining."
>> +  },
>> +  {
>> +    "EventCode": "0x000000F08C",
>> +    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
>> +    "BriefDescription": "TLBIE snoop cycles in which the Load-Store
>> unit is waiting for the MMU to finish invalidation."
>> +  },
>> +  {
>> +    "EventCode": "0x0000004884",
>> +    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
>> +    "BriefDescription": "Cycles in which no instructions are fetched
>> because there is no room in the instruction buffers."
>> +  },
>> +  {
>> +    "EventCode": "0x00000048B4",
>> +    "EventName": "PM_BR_TKN_UNCOND_FIN",
>> +    "BriefDescription": "An unconditional branch finished. All
>> unconditional branches are taken."
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000016080",
>> +    "EventName": "PM_L2_TLBIE_SLBIE_START",
>> +    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG
>> operation from the core. Event count should be multiplied by 2 since
>> the data is coming from a 2:1 clock domain and the data is time sliced
>> across all 4 threads."
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000016880",
>> +    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
>> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command
>> was held in a hottemp condition by the NCU Master. Multiply this count
>> by 1000 to obtain the total number of cycles. This can be divided by
>> PM_L2_TLBIE_SLBIE_SENT to obtain the average time a
>> TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied
>> by 2 since the data is coming from a 2:1 clock domain and the data is
>> time sliced across all 4 threads."
>> +  },
>> +  {
>> +    "EventCode": "0x0B0000026880",
>> +    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
>> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that
>> targets this thread's LPAR was in flight while in a hottemp condition.
>> Multiply this count by 1000 to obtain the total number of cycles. This
>> can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall
>> efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie
>> doesn’t include when SnpTLB is in NCU waiting to be launched serially
>> behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay
>> window when it detects it is above its TLBIE/SLBIE threshold for
>> process SnpTLBIE/SLBIE with this core. Event count should be
>> multiplied by 2 since the data is coming from a 2:1 clock domain and
>> the data is time sliced across all 4 threads."
>>     }
>>   ]
Arnaldo Carvalho de Melo July 31, 2024, 7:44 p.m. UTC | #6
On Fri, Jul 26, 2024 at 11:08:55AM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote:
> > On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
> > >
> > > Update JSON/events for power10 platform with additional events.
> > > Also move PM_VECTOR_LD_CMPL event from others.json to
> > > frontend.json file.
> > >
> > > Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> > 
> > Reviewed-by: Ian Rogers <irogers@google.com>
> 
> Thanks, applied to tmp.perf-tools-next,

This seems to be causing this:

Exception processing pmu-events/arch/powerpc/power10/others.json
Traceback (most recent call last):
  File "pmu-events/jevents.py", line 1309, in <module>
    main()
  File "pmu-events/jevents.py", line 1291, in main
    ftw(arch_path, [], preprocess_one_file)
  File "pmu-events/jevents.py", line 1241, in ftw
    ftw(item.path, parents + [item.name], action)
  File "pmu-events/jevents.py", line 1239, in ftw
    action(parents, item)
  File "pmu-events/jevents.py", line 623, in preprocess_one_file
    for event in read_json_events(item.path, topic):
  File "pmu-events/jevents.py", line 440, in read_json_events
    events = json.load(open(path), object_hook=JsonEvent)
  File "/usr/lib/python3.6/json/__init__.py", line 296, in load
  CC      /tmp/build/perf/bench/evlist-open-close.o
    return loads(fp.read(),
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9231: ordinal not in range(128)
pmu-events/Build:35: recipe for target '/tmp/build/perf/pmu-events/pmu-events.c' failed
make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
Makefile.perf:763: recipe for target '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
make[2]: *** Waiting for unfinished jobs....
  CC      /tmp/build/perf/tests/hists_cumulate.o
  CC      /tmp/build/perf/arch/powerpc/util/event.o
  CC      /tmp/build/perf/bench/breakpoint.o
  CC      /tmp/build/perf/builtin-data.o


This happened in the past, I'm now trying to figure this out :-\

This was in:

toolsbuilder@five:~$ cat dm.log/ubuntu:18.04-x-powerpc


So 32-bit powerpc, ubuntu 18.04

- Arnaldo
Arnaldo Carvalho de Melo July 31, 2024, 8:14 p.m. UTC | #7
On Wed, Jul 31, 2024 at 04:44:49PM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Jul 26, 2024 at 11:08:55AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote:
> > > On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
> > > >
> > > > Update JSON/events for power10 platform with additional events.
> > > > Also move PM_VECTOR_LD_CMPL event from others.json to
> > > > frontend.json file.
> > > >
> > > > Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> > > 
> > > Reviewed-by: Ian Rogers <irogers@google.com>
> > 
> > Thanks, applied to tmp.perf-tools-next,
> 
> This seems to be causing this:
> 
> Exception processing pmu-events/arch/powerpc/power10/others.json
> Traceback (most recent call last):
>   File "pmu-events/jevents.py", line 1309, in <module>
>     main()
>   File "pmu-events/jevents.py", line 1291, in main
>     ftw(arch_path, [], preprocess_one_file)
>   File "pmu-events/jevents.py", line 1241, in ftw
>     ftw(item.path, parents + [item.name], action)
>   File "pmu-events/jevents.py", line 1239, in ftw
>     action(parents, item)
>   File "pmu-events/jevents.py", line 623, in preprocess_one_file
>     for event in read_json_events(item.path, topic):
>   File "pmu-events/jevents.py", line 440, in read_json_events
>     events = json.load(open(path), object_hook=JsonEvent)
>   File "/usr/lib/python3.6/json/__init__.py", line 296, in load
>   CC      /tmp/build/perf/bench/evlist-open-close.o
>     return loads(fp.read(),
>   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>     return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9231: ordinal not in range(128)
> pmu-events/Build:35: recipe for target '/tmp/build/perf/pmu-events/pmu-events.c' failed
> make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
> make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
> Makefile.perf:763: recipe for target '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
> make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs....
>   CC      /tmp/build/perf/tests/hists_cumulate.o
>   CC      /tmp/build/perf/arch/powerpc/util/event.o
>   CC      /tmp/build/perf/bench/breakpoint.o
>   CC      /tmp/build/perf/builtin-data.o
> 
> 
> This happened in the past, I'm now trying to figure this out :-\
> 
> This was in:
> 
> toolsbuilder@five:~$ cat dm.log/ubuntu:18.04-x-powerpc
> 
> 
> So 32-bit powerpc, ubuntu 18.04

This did the trick, so I fixed it in my repo, please ack, just replacing
’ with ' :-\

- Arnaldo


diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
index 53ca610152faa237..3789304cb363bbb7 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
@@ -197,6 +197,6 @@
   {
     "EventCode": "0x0B0000026880",
     "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
-    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
+    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: 'inflight' means SnpTLB has been sent to core(ie doesn't include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a 'hottemp' delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
   }
 ]
kajoljain Aug. 1, 2024, 7:33 a.m. UTC | #8
On 8/1/24 01:44, Arnaldo Carvalho de Melo wrote:
> On Wed, Jul 31, 2024 at 04:44:49PM -0300, Arnaldo Carvalho de Melo wrote:
>> On Fri, Jul 26, 2024 at 11:08:55AM -0300, Arnaldo Carvalho de Melo wrote:
>>> On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote:
>>>> On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
>>>>>
>>>>> Update JSON/events for power10 platform with additional events.
>>>>> Also move PM_VECTOR_LD_CMPL event from others.json to
>>>>> frontend.json file.
>>>>>
>>>>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>>>>
>>>> Reviewed-by: Ian Rogers <irogers@google.com>
>>>
>>> Thanks, applied to tmp.perf-tools-next,
>>
>> This seems to be causing this:
>>
>> Exception processing pmu-events/arch/powerpc/power10/others.json
>> Traceback (most recent call last):
>>   File "pmu-events/jevents.py", line 1309, in <module>
>>     main()
>>   File "pmu-events/jevents.py", line 1291, in main
>>     ftw(arch_path, [], preprocess_one_file)
>>   File "pmu-events/jevents.py", line 1241, in ftw
>>     ftw(item.path, parents + [item.name], action)
>>   File "pmu-events/jevents.py", line 1239, in ftw
>>     action(parents, item)
>>   File "pmu-events/jevents.py", line 623, in preprocess_one_file
>>     for event in read_json_events(item.path, topic):
>>   File "pmu-events/jevents.py", line 440, in read_json_events
>>     events = json.load(open(path), object_hook=JsonEvent)
>>   File "/usr/lib/python3.6/json/__init__.py", line 296, in load
>>   CC      /tmp/build/perf/bench/evlist-open-close.o
>>     return loads(fp.read(),
>>   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>>     return codecs.ascii_decode(input, self.errors)[0]
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9231: ordinal not in range(128)
>> pmu-events/Build:35: recipe for target '/tmp/build/perf/pmu-events/pmu-events.c' failed
>> make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
>> make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
>> Makefile.perf:763: recipe for target '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
>> make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
>> make[2]: *** Waiting for unfinished jobs....
>>   CC      /tmp/build/perf/tests/hists_cumulate.o
>>   CC      /tmp/build/perf/arch/powerpc/util/event.o
>>   CC      /tmp/build/perf/bench/breakpoint.o
>>   CC      /tmp/build/perf/builtin-data.o
>>
>>
>> This happened in the past, I'm now trying to figure this out :-\
>>
>> This was in:
>>
>> toolsbuilder@five:~$ cat dm.log/ubuntu:18.04-x-powerpc
>>
>>
>> So 32-bit powerpc, ubuntu 18.04
> 
> This did the trick, so I fixed it in my repo, please ack, just replacing
> ’ with ' :-\
> 
> - Arnaldo
> 

Hi Arnaldo,
  Thanks for fixing it. I will make sure in next series of patches, we
are also checking for this combination to avoid ascii issue.

Change looks fine to me.

Thanks,
Kajol Jain

> 
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> index 53ca610152faa237..3789304cb363bbb7 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> @@ -197,6 +197,6 @@
>    {
>      "EventCode": "0x0B0000026880",
>      "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
> -    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: 'inflight' means SnpTLB has been sent to core(ie doesn't include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a 'hottemp' delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
>    }
>  ]
>
Arnaldo Carvalho de Melo Aug. 1, 2024, 2:56 p.m. UTC | #9
On Thu, Aug 01, 2024 at 01:03:44PM +0530, kajoljain wrote:
> 
> 
> On 8/1/24 01:44, Arnaldo Carvalho de Melo wrote:
> > On Wed, Jul 31, 2024 at 04:44:49PM -0300, Arnaldo Carvalho de Melo wrote:
> >> On Fri, Jul 26, 2024 at 11:08:55AM -0300, Arnaldo Carvalho de Melo wrote:
> >>> On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote:
> >>>> On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kjain@linux.ibm.com> wrote:
> >>>>>
> >>>>> Update JSON/events for power10 platform with additional events.
> >>>>> Also move PM_VECTOR_LD_CMPL event from others.json to
> >>>>> frontend.json file.
> >>>>>
> >>>>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> >>>>
> >>>> Reviewed-by: Ian Rogers <irogers@google.com>
> >>>
> >>> Thanks, applied to tmp.perf-tools-next,
> >>
> >> This seems to be causing this:
> >>
> >> Exception processing pmu-events/arch/powerpc/power10/others.json
> >> Traceback (most recent call last):
> >>   File "pmu-events/jevents.py", line 1309, in <module>
> >>     main()
> >>   File "pmu-events/jevents.py", line 1291, in main
> >>     ftw(arch_path, [], preprocess_one_file)
> >>   File "pmu-events/jevents.py", line 1241, in ftw
> >>     ftw(item.path, parents + [item.name], action)
> >>   File "pmu-events/jevents.py", line 1239, in ftw
> >>     action(parents, item)
> >>   File "pmu-events/jevents.py", line 623, in preprocess_one_file
> >>     for event in read_json_events(item.path, topic):
> >>   File "pmu-events/jevents.py", line 440, in read_json_events
> >>     events = json.load(open(path), object_hook=JsonEvent)
> >>   File "/usr/lib/python3.6/json/__init__.py", line 296, in load
> >>   CC      /tmp/build/perf/bench/evlist-open-close.o
> >>     return loads(fp.read(),
> >>   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> >>     return codecs.ascii_decode(input, self.errors)[0]
> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9231: ordinal not in range(128)
> >> pmu-events/Build:35: recipe for target '/tmp/build/perf/pmu-events/pmu-events.c' failed
> >> make[3]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 1
> >> make[3]: *** Deleting file '/tmp/build/perf/pmu-events/pmu-events.c'
> >> Makefile.perf:763: recipe for target '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
> >> make[2]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
> >> make[2]: *** Waiting for unfinished jobs....
> >>   CC      /tmp/build/perf/tests/hists_cumulate.o
> >>   CC      /tmp/build/perf/arch/powerpc/util/event.o
> >>   CC      /tmp/build/perf/bench/breakpoint.o
> >>   CC      /tmp/build/perf/builtin-data.o
> >>
> >>
> >> This happened in the past, I'm now trying to figure this out :-\
> >>
> >> This was in:
> >>
> >> toolsbuilder@five:~$ cat dm.log/ubuntu:18.04-x-powerpc
> >>
> >>
> >> So 32-bit powerpc, ubuntu 18.04
> > 
> > This did the trick, so I fixed it in my repo, please ack, just replacing
> > ’ with ' :-\
> > 
> > - Arnaldo
> > 
> 
> Hi Arnaldo,
>   Thanks for fixing it. I will make sure in next series of patches, we
> are also checking for this combination to avoid ascii issue.
> 
> Change looks fine to me.

Thanks for checking,

- Arnaldo
 
> Thanks,
> Kajol Jain
> 
> > 
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> > index 53ca610152faa237..3789304cb363bbb7 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
> > +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
> > @@ -197,6 +197,6 @@
> >    {
> >      "EventCode": "0x0B0000026880",
> >      "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
> > -    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> > +    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: 'inflight' means SnpTLB has been sent to core(ie doesn't include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a 'hottemp' delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
> >    }
> >  ]
> >
diff mbox series

Patch

diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
index 5977f5e64212..53660c279286 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json
@@ -74,6 +74,11 @@ 
     "EventName": "PM_ISSUE_KILL",
     "BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
   },
+  {
+    "EventCode": "0x44054",
+    "EventName": "PM_VECTOR_LD_CMPL",
+    "BriefDescription": "Vector load instruction completed."
+  },
   {
     "EventCode": "0x44056",
     "EventName": "PM_VECTOR_ST_CMPL",
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json b/tools/perf/pmu-events/arch/powerpc/power10/others.json
index fcf8a8ebe7bd..53ca610152fa 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
@@ -94,11 +94,6 @@ 
     "EventName": "PM_L1_ICACHE_RELOADED_ALL",
     "BriefDescription": "Counts all instruction cache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
   },
-  {
-    "EventCode": "0x44054",
-    "EventName": "PM_VECTOR_LD_CMPL",
-    "BriefDescription": "Vector load instruction completed."
-  },
   {
     "EventCode": "0x4D05E",
     "EventName": "PM_BR_CMPL",
@@ -108,5 +103,100 @@ 
     "EventCode": "0x400F0",
     "EventName": "PM_LD_DEMAND_MISS_L1_FIN",
     "BriefDescription": "Load missed L1, counted at finish time."
+  },
+  {
+    "EventCode": "0x00000038BC",
+    "EventName": "PM_ISYNC_CMPL",
+    "BriefDescription": "Isync completion count per thread."
+  },
+  {
+    "EventCode": "0x000000C088",
+    "EventName": "PM_LD0_32B_FIN",
+    "BriefDescription": "256-bit load finished in the LD0 load execution unit."
+  },
+  {
+    "EventCode": "0x000000C888",
+    "EventName": "PM_LD1_32B_FIN",
+    "BriefDescription": "256-bit load finished in the LD1 load execution unit."
+  },
+  {
+    "EventCode": "0x000000C090",
+    "EventName": "PM_LD0_UNALIGNED_FIN",
+    "BriefDescription": "Load instructions in LD0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
+  },
+  {
+    "EventCode": "0x000000C890",
+    "EventName": "PM_LD1_UNALIGNED_FIN",
+    "BriefDescription": "Load instructions in LD1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline using the load gather buffer. This typically adds about 10 cycles to the latency of the instruction. This includes loads that cross the 128 byte boundary, octword loads that are not aligned, and a special forward progress case of a load that does not hit in the L1 and crosses the 32 byte boundary and is launched NTC. Counted at finish time."
+  },
+  {
+    "EventCode": "0x000000C0A4",
+    "EventName": "PM_ST0_UNALIGNED_FIN",
+    "BriefDescription": "Store instructions in ST0 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
+  },
+  {
+    "EventCode": "0x000000C8A4",
+    "EventName": "PM_ST1_UNALIGNED_FIN",
+    "BriefDescription": "Store instructions in ST1 port that are either unaligned, or treated as unaligned and require an additional recycle through the pipeline. This typically adds about 10 cycles to the latency of the instruction. This only includes stores that cross the 128 byte boundary. Counted at finish time."
+  },
+  {
+    "EventCode": "0x000000C8B8",
+    "EventName": "PM_STCX_SUCCESS_CMPL",
+    "BriefDescription": "STCX instructions that completed successfully. Specifically, counts only when a pass status is returned from the nest."
+  },
+  {
+    "EventCode": "0x000000D0B4",
+    "EventName": "PM_DC_PREF_STRIDED_CONF",
+    "BriefDescription": "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software."
+  },
+  {
+    "EventCode": "0x000000F880",
+    "EventName": "PM_SNOOP_TLBIE_CYC",
+    "BriefDescription": "Cycles in which TLBIE snoops are executed in the LSU."
+  },
+  {
+    "EventCode": "0x000000F084",
+    "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC",
+    "BriefDescription": "TLBIE snoop cycles in which the data cache is being walked."
+  },
+  {
+    "EventCode": "0x000000F884",
+    "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC",
+    "BriefDescription": "TLBIE snoop cycles in which older stores are still draining."
+  },
+  {
+    "EventCode": "0x000000F088",
+    "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC",
+    "BriefDescription": "TLBIE snoop cycles in which older loads are still draining."
+  },
+  {
+    "EventCode": "0x000000F08C",
+    "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC",
+    "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit is waiting for the MMU to finish invalidation."
+  },
+  {
+    "EventCode": "0x0000004884",
+    "EventName": "PM_NO_FETCH_IBUF_FULL_CYC",
+    "BriefDescription": "Cycles in which no instructions are fetched because there is no room in the instruction buffers."
+  },
+  {
+    "EventCode": "0x00000048B4",
+    "EventName": "PM_BR_TKN_UNCOND_FIN",
+    "BriefDescription": "An unconditional branch finished. All unconditional branches are taken."
+  },
+  {
+    "EventCode": "0x0B0000016080",
+    "EventName": "PM_L2_TLBIE_SLBIE_START",
+    "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG operation from the core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
+  },
+  {
+    "EventCode": "0x0B0000016880",
+    "EventName": "PM_L2_TLBIE_SLBIE_DELAY",
+    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was held in a hottemp condition by the NCU Master. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG command was held. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
+  },
+  {
+    "EventCode": "0x0B0000026880",
+    "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY",
+    "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets this thread's LPAR was in flight while in a hottemp condition. Multiply this count by 1000 to obtain the total number of cycles. This can be divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). The NCU Snooper gets in a ’hottemp’ delay window when it detects it is above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. Event count should be multiplied by 2 since the data is coming from a 2:1 clock domain and the data is time sliced across all 4 threads."
   }
 ]