From patchwork Sat Jul 13 16:55:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Athira Rajeev X-Patchwork-Id: 1960192 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=rc9n2lgm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WLvkH6pfzz1xr4 for ; Sun, 14 Jul 2024 02:56:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=rc9n2lgm; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WLvkF47YVz3cZ0 for ; Sun, 14 Jul 2024 02:56:21 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=rc9n2lgm; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=atrajeev@linux.vnet.ibm.com; receiver=lists.ozlabs.org) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WLvjw4Nvmz30WF for ; Sun, 14 Jul 2024 02:56:03 +1000 (AEST) Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46DFTPlJ021995; Sat, 13 Jul 2024 16:55:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=pp1; bh=/1Kkiz6GdbEfd gIEGqakVd1VypJoM5XwAbjkEGwtAj4=; b=rc9n2lgmHYMS+LHC45L5WmS522M4u TgnDV610lQ7vGqRG37FHDPItDCi/oT94RpOBMVDj6NW4ZOdOlS+Q4rB5DnWEw/5V 3D6v/iUdvAAFKy8bDjUBQPqOEAzCj/rtUfW0JIQZyAqDQrlc3ZJnlzjccbTUIn5P T2t821D8UR66oWDl8wz4aZ78J6Htt/1YHl/mHFfsYEALrzxJ6qGYbb1o93/kqETm xEDFaf/v8V4i1n6qvlYsl6+GZ2SlICWjQZTcyT544NIJKm020J3x7GRki6lqRTV+ Avca72MV6rANO3PZrF3nunrIXOEmL1XuQnMOLo57ydkFzWGfAziAAnpIw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40brr10bxw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 13 Jul 2024 16:55:43 +0000 (GMT) Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46DGthhb010602; Sat, 13 Jul 2024 16:55:43 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40brr10bxe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 13 Jul 2024 16:55:43 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46DFZVaH010225; Sat, 13 Jul 2024 16:55:41 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 40bqxks5m7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 13 Jul 2024 16:55:41 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46DGtab341812414 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 13 Jul 2024 16:55:38 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 253732004B; Sat, 13 Jul 2024 16:55:36 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5271D20040; Sat, 13 Jul 2024 16:55:33 +0000 (GMT) Received: from localhost.localdomain (unknown [9.43.49.134]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Sat, 13 Jul 2024 16:55:33 +0000 (GMT) From: Athira Rajeev To: acme@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, irogers@google.com, namhyung@kernel.org, segher@kernel.crashing.org, christophe.leroy@csgroup.eu Subject: [PATCH V7 00/18] Add data type profiling support for powerpc Date: Sat, 13 Jul 2024 22:25:11 +0530 Message-Id: <20240713165529.59298-1-atrajeev@linux.vnet.ibm.com> X-Mailer: git-send-email 2.35.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: n91al67egD-3zB5PbncaOC9wvYim8ztm X-Proofpoint-GUID: bBNQZm1qwov2H_GFv0RvxZ99t98lyHHW X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-13_13,2024-07-11_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 lowpriorityscore=0 impostorscore=0 bulkscore=0 mlxscore=0 adultscore=0 priorityscore=1501 suspectscore=0 mlxlogscore=999 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2407130127 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: atrajeev@linux.vnet.ibm.com, kjain@linux.ibm.com, linux-kernel@vger.kernel.org, akanksha@linux.ibm.com, linux-perf-users@vger.kernel.org, maddy@linux.ibm.com, disgoel@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The patchset from Namhyung added support for data type profiling in perf tool. This enabled support to associate PMU samples to data types they refer using DWARF debug information. With the upstream perf, currently it possible to run perf report or perf annotate to view the data type information on x86. Initial patchset posted here had changes need to enable data type profiling support for powerpc. https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/ Main change were: 1. powerpc instruction nmemonic table to associate load/store instructions with move_ops which is use to identify if instruction is a memory access one. 2. To get register number and access offset from the given instruction, code uses fields from "struct arch" -> objump. Added entry for powerpc here. 3. A get_arch_regnum to return register number from the register name string. But the apporach used in the initial patchset used parsing of disassembled code which the current perf tool implementation does. Example: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. Also to find whether there is a memory reference in the operands, "memory_ref_char" field of objdump is used. For x86, "(" is used as memory_ref_char to tackle instructions of the form "mov (%rax), %rcx". In case of powerpc, not all instructions using "(" are the only memory instructions. Example, above instruction can also be of extended form (X form) "lwzx r10,0,r19". Inorder to easy identify the instruction category and extract the source/target registers, second patchset added support to use raw instruction. With raw instruction, macros are added to extract opcode and register fields. Link to second patchset: https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/ Example representation using --show-raw-insn in objdump gives result: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. In powerpc, this translates to instruction form: "ld RT,DS(RA)" and binary code as: _____________________________________ | 58 | RT | RA | DS | | ------------------------------------- 0 6 11 16 30 31 Second patchset used "objdump" again to read the raw instruction. But since there is no need to disassemble and binary code can be read directly from the DSO, third patchset (ie this patchset) uses below apporach. The apporach preferred in powerpc to parse sample for data type profiling in V3 patchset is: - Read directly from DSO using dso__data_read_offset - If that fails for any case, fallback to using libcapstone - If libcapstone is not supported, approach will use objdump Patchset adds support to pick the opcode and reg fields from this raw/binary instruction code. This approach came in from review comment by Segher Boessenkool and Christophe for the initial patchset. Apart from that, instruction tracking is enabled for powerpc and support function is added to find variables defined as registers Example, in powerpc, below two registers are defined to represent variable: 1. r13: represents local_paca register struct paca_struct *local_paca asm("r13"); 2. r1: represents stack_pointer register void *__stack_pointer asm("r1"); These are handled in this patchset. - Patch 1 is to rearrange register state type structures to header file so that it can referred from other arch specific files - Patch 2 is to make instruction tracking as a callback to"struct arch" so that it can be implemented by other archs easily and defined in arch specific files - Patch 3 is to handle state type regs array size for x86 and powerpc - Patch 4 adds support to capture and parse raw instruction in powerpc using dso__data_read_offset utility - Patch 4 also adds logic to support using objdump when doing default "perf report" or "perf annotate" since it that needs disassembled instruction. - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc - Patch 6 update parameters for reg extract functions to use raw instruction on powerpc - Patch 7 updates ins__find to carry raw_insn and also adds parse callback for memory instructions for powerpc - Patch 8 add support to identify memory instructions of opcode 31 in powerpc - Patch 9 adds more instructions to support instruction tracking in powerpc - Patch 10 and 11 handles instruction tracking for powerpc. - Patch 12, 13 and 14 add support to use libcapstone in powerpc - Patch 15 and patch 16 handles support to find global register variables - PAtch 17 updates data type compare functions data_type_cmp and sort__typeoff_sort to include var_name along with type_name in comparison. - Patch 18 handles insn-stat option for perf annotate Note: - There are remaining unknowns (25%) as seen in annotate Instruction stats below. - This patchset is not tested on powerpc32. In next step of enhancements along with handling remaining unknowns, plan to cover powerpc32 changes based on how testing goes. With the current patchset: ./perf record -a -e mem-loads sleep 1 ./perf report -s type,typeoff --hierarchy --group --stdio ./perf annotate --data-type --insn-stat perf annotate logs: ================== Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%) Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1 perf report logs: ================= Total Lost Samples: 0 Samples: 1K of event 'mem-loads' Event count (approx.): 937238 Overhead Data Type Data Type Offset ........ ......... ................ 48.60% (unknown) (unknown) +0 (no field) 11.42% long unsigned int long unsigned int +0 (current_stack_pointer) 4.68% struct paca_struct struct paca_struct +2312 (__current) 4.57% struct paca_struct struct paca_struct +2354 (irq_soft_mask) 2.69% struct paca_struct struct paca_struct +2808 (canary) 2.68% struct paca_struct struct paca_struct +8 (paca_index) 2.24% struct paca_struct struct paca_struct +48 (data_offset) 1.43% long unsigned int long unsigned int +0 (no field) 1.41% struct vm_fault struct vm_fault +0 (vma) 1.29% struct task_struct struct task_struct +276 (flags) 1.03% struct pt_regs struct pt_regs +264 (user_regs.msr) 0.90% struct security_hook_list struct security_hook_list +0 (list.next) 0.76% struct irq_desc struct irq_desc +304 (irq_data.chip) 0.76% struct rq struct rq +2856 (cpu) 0.72% long long unsigned int long long unsigned int +0 (no field) Thanks Athira Rajeev Changelog: From v6 -> v7: - Addressed review comments from Namhyung Changed format string space to %-20s while printing instruction stats in patch 18. Use cmp_null in patch 17 while comparing var_name to properly sort with correct order. From v5 -> v6: - Addressed review comments from Namhyung Conditionally define TYPE_STATE_MAX_REGS based on arch. Added macro for defining width of the raw codes and spaces in disasm_line__parse_powerpc. Call disasm_line__parse from disasm_line__parse_powerpc for generic code. Renamed symbol__disassemble_dso to symbol__disassemble_raw. Fixed find_data_type_global_reg to correclty free var_types and change indent level. Fixed data_type_cmp and sort__typeoff_sort to include var_name in comparing data type entries. From v4 -> v5: - Addressed review comments from Namhyung Handle max number of type state regs as 16 for x86 and 32 for powerpc. Added generic support for objdump patch first and DSO read optimisation next combined patch 3 and patch 4 in patchseries V4 to one patch Changed reference for "raw_insn" to use "u32" Splitted "parse" callback patch changes and "ins__find" patch changes into two Instead of making weak function, added get_powerpc_regs to extract register and offset fields for powerpc - Addressed complation fail when "dwarf.h" is not present ie elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT when including functions that use Dwarf references. Also conditionally include some of the header files. From v3->v4: - Addressed review comments from Ian by using capston_init from "util/print_insn.c" instead of "open_capston_handle". - Addressed review comment from Namhyung by moving "opcode" field from "struct ins" to "struct disasm_line" From v2->v3: - Addressed review comments from Christophe and Namhyung for V2 - Changed the apporach in powerpc to parse sample for data type profiling as: Read directly from DSO using dso__data_read_offset If that fails for any case, fallback to using libcapstone If libcapstone is not supported, approach will use objdump - Include instructions with opcode as 31 and correctly categorize them as memory or arithmetic instructions. - Include more instructions for instruction tracking in powerpc From v1->v2: - Addressed suggestion from Christophe Leroy and Segher Boessenkool to use the binary code (raw insn) to fetch opcode, register and offset fields. - Added support for instruction tracking in powerpc - Find the register defined variables (r13 and r1 which points to local_paca and current_stack_pointer in powerpc) Athira Rajeev (18): tools/perf: Move the data structures related to register type to header file tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc tools/perf: Add disasm_line__parse to parse raw instruction for powerpc tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc tools/perf: Add parse function for memory instructions in powerpc tools/perf: Add support to identify memory instructions of opcode 31 in powerpc tools/perf: Add some of the arithmetic instructions to support instruction tracking in powerpc tools/perf: Add more instructions for instruction tracking tools/perf: Update instruction tracking for powerpc tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c tools/perf: Add support to use libcapstone in powerpc tools/perf: Add support to find global register variables using find_data_type_global_reg tools/perf: Add support for global_die to capture name of variable in case of register defined variable tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison tools/perf: Set instruction name to be used with insn-stat when using raw instruction tools/include/linux/string.h | 2 + tools/lib/string.c | 13 + tools/perf/arch/arm64/annotate/instructions.c | 3 +- .../arch/loongarch/annotate/instructions.c | 6 +- .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++ tools/perf/arch/powerpc/util/dwarf-regs.c | 53 ++ tools/perf/arch/s390/annotate/instructions.c | 5 +- tools/perf/arch/x86/annotate/instructions.c | 377 ++++++++++++ tools/perf/builtin-annotate.c | 4 +- tools/perf/util/annotate-data.c | 544 ++++-------------- tools/perf/util/annotate-data.h | 83 +++ tools/perf/util/annotate.c | 29 +- tools/perf/util/annotate.h | 6 +- tools/perf/util/disasm.c | 468 +++++++++++++-- tools/perf/util/disasm.h | 19 +- tools/perf/util/dwarf-aux.c | 1 + tools/perf/util/dwarf-aux.h | 1 + tools/perf/util/include/dwarf-regs.h | 12 + tools/perf/util/print_insn.c | 15 +- tools/perf/util/print_insn.h | 5 + tools/perf/util/sort.c | 25 +- tools/perf/util/sort.h | 1 + 22 files changed, 1421 insertions(+), 505 deletions(-) Tested-by: Kajol Jain Reviewed-by: Kajol Jain Reviewed-by: Namhyung Kim Reviewed-by: Namhyung Kim