Message ID | 20241102175115.1769468-8-xur@google.com |
---|---|
State | New |
Headers | show |
Series | Add AutoFDO and Propeller support for Clang build | expand |
Hi Rong, On Sat, Nov 02, 2024 at 10:51:14AM -0700, Rong Xu wrote: > diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller > new file mode 100644 > index 0000000000000..344190717e471 > --- /dev/null > +++ b/scripts/Makefile.propeller > @@ -0,0 +1,28 @@ > +# SPDX-License-Identifier: GPL-2.0 > + > +# Enable available and selected Clang Propeller features. > +ifdef CLANG_PROPELLER_PROFILE_PREFIX > + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections > + KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering > +else > + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels > +endif It appears that '-fbasic-block-sections=labels' has been deprecated in the main branch of LLVM, as I see a warning repeated over and over when building allmodconfig: clang: warning: argument '-fbasic-block-sections=labels' is deprecated, use '-fbasic-block-address-map' instead [-Wdeprecated] https://github.com/llvm/llvm-project/commit/7b7747dc1d3da1a829503ea9505b4cecce4f5bda Sorry that I missed this during testing, as I was only using clang-19 at the time. I think you can send a fixup on top of Masahiro's branch: https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git/log/?h=kbuild > +# Propeller requires debug information to embed module names in the profiles. > +# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO, > +# as the option should already be set. > +ifndef CONFIG_DEBUG_INFO > + ifndef CONFIG_AUTOFDO_CLANG > + CFLAGS_PROPELLER_CLANG += -gmlt > + endif > +endif > + > +ifdef CONFIG_LTO_CLANG_THIN > + ifdef CLANG_PROPELLER_PROFILE_PREFIX > + KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt > + else > + KBUILD_LDFLAGS += --lto-basic-block-sections=labels I think this might have a similar problem but I have not tested. > + endif > +endif > + > +export CFLAGS_PROPELLER_CLANG Cheers, Nathan
Thanks Nathan for finding this out! We changed the propeller option with this patch: https://github.com/llvm/llvm-project/pull/110039 Currently, this patch is only in the ToT clang (v20) and not yet released in v19. I'll add a compiler version check to the patch: if the clang version >= 20, use the new option. If this patch is later released in v19.x clang, I'll have to update the check accordingly. If I don't hear objections, I'll send a fixup on top of Masahiro's branch. Thanks, -Rong On Thu, Nov 7, 2024 at 12:45 PM Nathan Chancellor <nathan@kernel.org> wrote: > > Hi Rong, > > On Sat, Nov 02, 2024 at 10:51:14AM -0700, Rong Xu wrote: > > diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller > > new file mode 100644 > > index 0000000000000..344190717e471 > > --- /dev/null > > +++ b/scripts/Makefile.propeller > > @@ -0,0 +1,28 @@ > > +# SPDX-License-Identifier: GPL-2.0 > > + > > +# Enable available and selected Clang Propeller features. > > +ifdef CLANG_PROPELLER_PROFILE_PREFIX > > + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections > > + KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering > > +else > > + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels > > +endif > > It appears that '-fbasic-block-sections=labels' has been deprecated in > the main branch of LLVM, as I see a warning repeated over and over when > building allmodconfig: > > clang: warning: argument '-fbasic-block-sections=labels' is deprecated, use '-fbasic-block-address-map' instead [-Wdeprecated] > > https://github.com/llvm/llvm-project/commit/7b7747dc1d3da1a829503ea9505b4cecce4f5bda > > Sorry that I missed this during testing, as I was only using clang-19 at > the time. > > I think you can send a fixup on top of Masahiro's branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git/log/?h=kbuild > > > +# Propeller requires debug information to embed module names in the profiles. > > +# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO, > > +# as the option should already be set. > > +ifndef CONFIG_DEBUG_INFO > > + ifndef CONFIG_AUTOFDO_CLANG > > + CFLAGS_PROPELLER_CLANG += -gmlt > > + endif > > +endif > > + > > +ifdef CONFIG_LTO_CLANG_THIN > > + ifdef CLANG_PROPELLER_PROFILE_PREFIX > > + KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt > > + else > > + KBUILD_LDFLAGS += --lto-basic-block-sections=labels > > I think this might have a similar problem but I have not tested. > > > + endif > > +endif > > + > > +export CFLAGS_PROPELLER_CLANG > > Cheers, > Nathan
On 11/2/24 10:51 AM, Rong Xu wrote: > Add the build support for using Clang's Propeller optimizer. Like > AutoFDO, Propeller uses hardware sampling to gather information > about the frequency of execution of different code paths within a > binary. This information is then used to guide the compiler's > optimization decisions, resulting in a more efficient binary. > > The support requires a Clang compiler LLVM 19 or later, and the > create_llvm_prof tool > (https://github.com/google/autofdo/releases/tag/v0.30.1). This > commit is limited to x86 platforms that support PMU features > like LBR on Intel machines and AMD Zen3 BRS. > > Here is an example workflow for building an AutoFDO+Propeller > optimized kernel: > > 1) Build the kernel on the host machine, with AutoFDO and Propeller > build config > CONFIG_AUTOFDO_CLANG=y > CONFIG_PROPELLER_CLANG=y > then > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> > > “<autofdo_profile>” is the profile collected when doing a non-Propeller > AutoFDO build. This step builds a kernel that has the same optimization > level as AutoFDO, plus a metadata section that records basic block > information. This kernel image runs as fast as an AutoFDO optimized > kernel. > > 2) Install the kernel on test/production machines. > > 3) Run the load tests. The '-c' option in perf specifies the sample > event period. We suggest using a suitable prime number, > like 500009, for this purpose. > For Intel platforms: > $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ > -o <perf_file> -- <loadtest> > For AMD platforms: > The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 > # To see if Zen3 support LBR: > $ cat proc/cpuinfo | grep " brs" > # To see if Zen4 support LBR: > $ cat proc/cpuinfo | grep amd_lbr_v2 > # If the result is yes, then collect the profile using: > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ > -N -b -c <count> -o <perf_file> -- <loadtest> > > 4) (Optional) Download the raw perf file to the host machine. > > 5) Generate Propeller profile: > $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ > --format=propeller --propeller_output_module_name \ > --out=<propeller_profile_prefix>_cc_profile.txt \ > --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt > > “create_llvm_prof” is the profile conversion tool, and a prebuilt > binary for linux can be found on > https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build > from source). > > "<propeller_profile_prefix>" can be something like > "/home/user/dir/any_string". > > This command generates a pair of Propeller profiles: > "<propeller_profile_prefix>_cc_profile.txt" and > "<propeller_profile_prefix>_ld_profile.txt". > > 6) Rebuild the kernel using the AutoFDO and Propeller profile files. > CONFIG_AUTOFDO_CLANG=y > CONFIG_PROPELLER_CLANG=y > and > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ > CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> > > Co-developed-by: Han Shen <shenhan@google.com> > Signed-off-by: Han Shen <shenhan@google.com> > Signed-off-by: Rong Xu <xur@google.com> > Suggested-by: Sriraman Tallam <tmsriram@google.com> > Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> > Suggested-by: Nick Desaulniers <ndesaulniers@google.com> > Suggested-by: Stephane Eranian <eranian@google.com> > Tested-by: Yonghong Song <yonghong.song@linux.dev> > Tested-by: Nathan Chancellor <nathan@kernel.org> > Reviewed-by: Kees Cook <kees@kernel.org> > --- > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/propeller.rst | 162 ++++++++++++++++++++++++++ > MAINTAINERS | 7 ++ > Makefile | 1 + > arch/Kconfig | 19 +++ > arch/x86/Kconfig | 1 + > arch/x86/kernel/vmlinux.lds.S | 4 + > include/asm-generic/vmlinux.lds.h | 6 +- > scripts/Makefile.lib | 10 ++ > scripts/Makefile.propeller | 28 +++++ > tools/objtool/check.c | 1 + > 11 files changed, 237 insertions(+), 3 deletions(-) > create mode 100644 Documentation/dev-tools/propeller.rst > create mode 100644 scripts/Makefile.propeller > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst > index 6945644f7008a..3c0ac08b27091 100644 > --- a/Documentation/dev-tools/index.rst > +++ b/Documentation/dev-tools/index.rst > @@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst > checkuapi > gpio-sloppy-logic-analyzer > autofdo > + propeller > > > .. only:: subproject and html > diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst > new file mode 100644 > index 0000000000000..92195958e3dbc > --- /dev/null > +++ b/Documentation/dev-tools/propeller.rst > @@ -0,0 +1,162 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +===================================== > +Using Propeller with the Linux kernel > +===================================== > + > +This enables Propeller build support for the kernel when using Clang > +compiler. Propeller is a profile-guided optimization (PGO) method used > +to optimize binary executables. Like AutoFDO, it utilizes hardware > +sampling to gather information about the frequency of execution of > +different code paths within a binary. Unlike AutoFDO, this information > +is then used right before linking phase to optimize (among others) > +block layout within and across functions. > + > +A few important notes about adopting Propeller optimization: > + > +#. Although it can be used as a standalone optimization step, it is > + strongly recommended to apply Propeller on top of AutoFDO, > + AutoFDO+ThinLTO or Instrument FDO. The rest of this document > + assumes this paradigm. > + > +#. Propeller uses another round of profiling on top of > + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves > + "build-afdo - train-afdo - build-propeller - train-propeller - > + build-optimized". > + > +#. Propeller requires LLVM 19 release or later for Clang/Clang++ > + and the linker(ld.lld). > + > +#. In addition to LLVM toolchain, Propeller requires a profiling > + conversion tool: https://github.com/google/autofdo with a release > + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. > + > +The Propeller optimization process involves the following steps: > + > +#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as > + you would normally do, but with a set of compile-time / link-time > + flags, so that a special metadata section is created within the > + kernel binary. The special section is only intend to be used by the > + profiling tool, it is not part of the runtime image, nor does it > + change kernel run time text sections. > + > +#. Profiling: The above kernel is then run with a representative > + workload to gather execution frequency data. This data is collected > + using hardware sampling, via perf. Propeller is most effective on > + platforms supporting advanced PMU features like LBR on Intel > + machines. This step is the same as profiling the kernel for AutoFDO > + (the exact perf parameters can be different). > + > +#. Propeller profile generation: Perf output file is converted to a > + pair of Propeller profiles via an offline tool. > + > +#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized > + binary as you would normally do, but with a compile-time / > + link-time flag to pick up the Propeller compile time and link time > + profiles. This build step uses 3 profiles - the AutoFDO profile, > + the Propeller compile-time profile and the Propeller link-time > + profile. > + > +#. Deployment: The optimized kernel binary is deployed and used > + in production environments, providing improved performance > + and reduced latency. > + > +Preparation > +=========== > + > +Configure the kernel with:: > + > + CONFIG_AUTOFDO_CLANG=y > + CONFIG_PROPELLER_CLANG=y > + > +Customization > +============= > + > +The default CONFIG_PROPELLER_CLANG setting covers kernel space objects > +for Propeller builds. One can, however, enable or disable Propeller build > +for individual files and directories by adding a line similar to the > +following to the respective kernel Makefile: > + > +- For enabling a single file (e.g. foo.o):: > + > + PROPELLER_PROFILE_foo.o := y > + > +- For enabling all files in one directory:: > + > + PROPELLER_PROFILE := y > + > +- For disabling one file:: > + > + PROPELLER_PROFILE_foo.o := n > + > +- For disabling all files in one directory:: > + > + PROPELLER__PROFILE := n > + > + > +Workflow > +======== > + > +Here is an example workflow for building an AutoFDO+Propeller kernel: > + > +1) Assuming an AutoFDO profile is already collected following > + instructions in the AutoFDO document, build the kernel on the host > + machine, with AutoFDO and Propeller build configs :: > + > + CONFIG_AUTOFDO_CLANG=y > + CONFIG_PROPELLER_CLANG=y > + > + and :: > + > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> > + > +2) Install the kernel on the test machine. > + > +3) Run the load tests. The '-c' option in perf specifies the sample > + event period. We suggest using a suitable prime number, like 500009, > + for this purpose. > + > + - For Intel platforms:: > + > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > + > + - For AMD platforms:: > + > + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > + > + Note you can repeat the above steps to collect multiple <perf_file>s. > + > +4) (Optional) Download the raw perf file(s) to the host machine. > + > +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to > + generate Propeller profile. :: > + > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> > + --format=propeller --propeller_output_module_name > + --out=<propeller_profile_prefix>_cc_profile.txt > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt Prevously I am using perf-6.8.5-0.hs1.hsx.el9.x86_64 and it works fine. Now in my system, the perf is upgraded to 6.12.gadc218676eef [root@twshared7248.15.atn5 ~]# perf --version perf version 6.12.gadc218676eef and create_llvm_prof does not work any more. The command to collect sampling data: # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s stress: info: [536354] dispatching hogs: 36 cpu, 36 io, 36 vm, 0 hdd stress: info: [536354] successful run completed in 300s [ perf record: Woken up 2210 times to write data ] [ perf record: Captured and wrote 562.529 MB perf.data (701971 samples) ] # uname -r 6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 The kernel is a 6.11 lto kernel. I then run the following command: $ cat ../run.sh # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 \ # -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s # good: perf-6.8.5-0.hs1.hsx.el9.x86_64 # <propeller_profile_prefix>: /tmp/propeller ./create_llvm_prof --binary=vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 \ --profile=perf.data \ --format=propeller --propeller_output_module_name \ --out=/tmp/propeller_cc_profile.txt \ --propeller_symorder=/tmp/propeller_ld_profile.txt $ ./run.sh WARNING: Logging before InitGoogleLogging() is written to STDERR I20241212 13:12:18.401772 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 I20241212 13:12:18.403692 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 I20241212 13:12:18.404873 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 I20241212 13:12:18.521499 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 I20241212 13:12:18.521530 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 I20241212 13:12:18.521553 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 I20241212 13:12:18.521611 463318 llvm_propeller_perf_lbr_aggregator.cc:51] Parsing [1/1] perf.data ... [ERROR:/home/runner/work/autofdo/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1386] Event size 132 after uint64_t alignment of the filename length is greater than event size 128 reported by perf for the buildid event of type 0 W20241212 13:12:18.521708 463318 llvm_propeller_perf_lbr_aggregator.cc:55] Skipped profile [1/1] perf.data: FAILED_PRECONDITION: Failed to read perf data file: [1/1] perf.data W20241212 13:12:18.521718 463318 llvm_propeller_perf_lbr_aggregator.cc:67] Too few branch records in perf data. E20241212 13:12:18.554437 463318 create_llvm_prof.cc:238] FAILED_PRECONDITION: No perf file is parsed, cannot proceed. Could you help take a look why perf 12 does not work with create_llvm_prof? The create_llvm_prof is downloaded from https://github.com/google/autofdo/releases/tag/v0.30.1. > + > + "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". > + > + This command generates a pair of Propeller profiles: > + "<propeller_profile_prefix>_cc_profile.txt" and > + "<propeller_profile_prefix>_ld_profile.txt". > + > + If there are more than 1 perf_file collected in the previous step, > + you can create a temp list file "<perf_file_list>" with each line > + containing one perf file name and run:: > + > + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> > + --format=propeller --propeller_output_module_name > + --out=<propeller_profile_prefix>_cc_profile.txt > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt > + > +6) Rebuild the kernel using the AutoFDO and Propeller > + profiles. :: > + > + CONFIG_AUTOFDO_CLANG=y > + CONFIG_PROPELLER_CLANG=y > + > + and :: > + > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> > diff --git a/MAINTAINERS b/MAINTAINERS > index d6ea49433747a..42e3af0791e15 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -18449,6 +18449,13 @@ S: Maintained > F: include/linux/psi* > F: kernel/sched/psi.c > [...]
On Thu, Dec 12, 2024 at 01:20:46PM -0800, Yonghong Song wrote: ... > > +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to > > + generate Propeller profile. :: > > + > > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> > > + --format=propeller --propeller_output_module_name > > + --out=<propeller_profile_prefix>_cc_profile.txt > > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt > > Prevously I am using perf-6.8.5-0.hs1.hsx.el9.x86_64 and it works fine. > Now in my system, the perf is upgraded to 6.12.gadc218676eef > > [root@twshared7248.15.atn5 ~]# perf --version > perf version 6.12.gadc218676eef > > and create_llvm_prof does not work any more. > > The command to collect sampling data: > > # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s > stress: info: [536354] dispatching hogs: 36 cpu, 36 io, 36 vm, 0 hdd > stress: info: [536354] successful run completed in 300s > [ perf record: Woken up 2210 times to write data ] > [ perf record: Captured and wrote 562.529 MB perf.data (701971 samples) ] > # uname -r > 6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 > > The kernel is a 6.11 lto kernel. > > I then run the following command: > $ cat ../run.sh > # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 \ > # -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s > # good: perf-6.8.5-0.hs1.hsx.el9.x86_64 > > # <propeller_profile_prefix>: /tmp/propeller > ./create_llvm_prof --binary=vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 \ > --profile=perf.data \ > --format=propeller --propeller_output_module_name \ > --out=/tmp/propeller_cc_profile.txt \ > --propeller_symorder=/tmp/propeller_ld_profile.txt > > $ ./run.sh > WARNING: Logging before InitGoogleLogging() is written to STDERR > I20241212 13:12:18.401772 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 > I20241212 13:12:18.403692 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 > I20241212 13:12:18.404873 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 > I20241212 13:12:18.521499 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 > I20241212 13:12:18.521530 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 > I20241212 13:12:18.521553 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 > I20241212 13:12:18.521611 463318 llvm_propeller_perf_lbr_aggregator.cc:51] Parsing [1/1] perf.data ... > [ERROR:/home/runner/work/autofdo/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1386] Event size 132 after uint64_t alignment of the filename length is greater than event size 128 reported by perf for the buildid event of type 0 > W20241212 13:12:18.521708 463318 llvm_propeller_perf_lbr_aggregator.cc:55] Skipped profile [1/1] perf.data: FAILED_PRECONDITION: Failed to read perf data file: [1/1] perf.data > W20241212 13:12:18.521718 463318 llvm_propeller_perf_lbr_aggregator.cc:67] Too few branch records in perf data. > E20241212 13:12:18.554437 463318 create_llvm_prof.cc:238] FAILED_PRECONDITION: No perf file is parsed, cannot proceed. > > > Could you help take a look why perf 12 does not work with create_llvm_prof? > The create_llvm_prof is downloaded from https://github.com/google/autofdo/releases/tag/v0.30.1. I think Peter may have reported the same issue on GitHub? https://github.com/google/autofdo/issues/233 I wonder if this is a kernel side or perf tool regression? Cheers, Nathan
We will take a look at this issue and get back to you. Thanks for reporting this. Best Regards, -Rong On Thu, Dec 12, 2024 at 1:34 PM Nathan Chancellor <nathan@kernel.org> wrote: > > On Thu, Dec 12, 2024 at 01:20:46PM -0800, Yonghong Song wrote: > ... > > > +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to > > > + generate Propeller profile. :: > > > + > > > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> > > > + --format=propeller --propeller_output_module_name > > > + --out=<propeller_profile_prefix>_cc_profile.txt > > > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt > > > > Prevously I am using perf-6.8.5-0.hs1.hsx.el9.x86_64 and it works fine. > > Now in my system, the perf is upgraded to 6.12.gadc218676eef > > > > [root@twshared7248.15.atn5 ~]# perf --version > > perf version 6.12.gadc218676eef > > > > and create_llvm_prof does not work any more. > > > > The command to collect sampling data: > > > > # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s > > stress: info: [536354] dispatching hogs: 36 cpu, 36 io, 36 vm, 0 hdd > > stress: info: [536354] successful run completed in 300s > > [ perf record: Woken up 2210 times to write data ] > > [ perf record: Captured and wrote 562.529 MB perf.data (701971 samples) ] > > # uname -r > > 6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 > > > > The kernel is a 6.11 lto kernel. > > > > I then run the following command: > > $ cat ../run.sh > > # perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c 500009 \ > > # -- stress --cpu 36 --io 36 --vm 36 --vm-bytes 128M --timeout 300s > > # good: perf-6.8.5-0.hs1.hsx.el9.x86_64 > > > > # <propeller_profile_prefix>: /tmp/propeller > > ./create_llvm_prof --binary=vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39 \ > > --profile=perf.data \ > > --format=propeller --propeller_output_module_name \ > > --out=/tmp/propeller_cc_profile.txt \ > > --propeller_symorder=/tmp/propeller_ld_profile.txt > > > > $ ./run.sh > > WARNING: Logging before InitGoogleLogging() is written to STDERR > > I20241212 13:12:18.401772 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 > > I20241212 13:12:18.403692 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 > > I20241212 13:12:18.404873 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 > > I20241212 13:12:18.521499 463318 llvm_propeller_binary_content.cc:376] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is PIE: 0 > > I20241212 13:12:18.521530 463318 llvm_propeller_binary_content.cc:380] 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39' is relocatable: 0 > > I20241212 13:12:18.521553 463318 llvm_propeller_binary_content.cc:388] Build Id found in 'vmlinux-6.11.1-0_fbk0_lto_rc19_612_gb572dfac1b39': eaacd5a14abc48cf832b3ad0fa6c64635ab569a8 > > I20241212 13:12:18.521611 463318 llvm_propeller_perf_lbr_aggregator.cc:51] Parsing [1/1] perf.data ... > > [ERROR:/home/runner/work/autofdo/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1386] Event size 132 after uint64_t alignment of the filename length is greater than event size 128 reported by perf for the buildid event of type 0 > > W20241212 13:12:18.521708 463318 llvm_propeller_perf_lbr_aggregator.cc:55] Skipped profile [1/1] perf.data: FAILED_PRECONDITION: Failed to read perf data file: [1/1] perf.data > > W20241212 13:12:18.521718 463318 llvm_propeller_perf_lbr_aggregator.cc:67] Too few branch records in perf data. > > E20241212 13:12:18.554437 463318 create_llvm_prof.cc:238] FAILED_PRECONDITION: No perf file is parsed, cannot proceed. > > > > > > Could you help take a look why perf 12 does not work with create_llvm_prof? > > The create_llvm_prof is downloaded from https://github.com/google/autofdo/releases/tag/v0.30.1. > > I think Peter may have reported the same issue on GitHub? > > https://github.com/google/autofdo/issues/233 > > I wonder if this is a kernel side or perf tool regression? > > Cheers, > Nathan
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 6945644f7008a..3c0ac08b27091 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst checkuapi gpio-sloppy-logic-analyzer autofdo + propeller .. only:: subproject and html diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst new file mode 100644 index 0000000000000..92195958e3dbc --- /dev/null +++ b/Documentation/dev-tools/propeller.rst @@ -0,0 +1,162 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Using Propeller with the Linux kernel +===================================== + +This enables Propeller build support for the kernel when using Clang +compiler. Propeller is a profile-guided optimization (PGO) method used +to optimize binary executables. Like AutoFDO, it utilizes hardware +sampling to gather information about the frequency of execution of +different code paths within a binary. Unlike AutoFDO, this information +is then used right before linking phase to optimize (among others) +block layout within and across functions. + +A few important notes about adopting Propeller optimization: + +#. Although it can be used as a standalone optimization step, it is + strongly recommended to apply Propeller on top of AutoFDO, + AutoFDO+ThinLTO or Instrument FDO. The rest of this document + assumes this paradigm. + +#. Propeller uses another round of profiling on top of + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves + "build-afdo - train-afdo - build-propeller - train-propeller - + build-optimized". + +#. Propeller requires LLVM 19 release or later for Clang/Clang++ + and the linker(ld.lld). + +#. In addition to LLVM toolchain, Propeller requires a profiling + conversion tool: https://github.com/google/autofdo with a release + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. + +The Propeller optimization process involves the following steps: + +#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as + you would normally do, but with a set of compile-time / link-time + flags, so that a special metadata section is created within the + kernel binary. The special section is only intend to be used by the + profiling tool, it is not part of the runtime image, nor does it + change kernel run time text sections. + +#. Profiling: The above kernel is then run with a representative + workload to gather execution frequency data. This data is collected + using hardware sampling, via perf. Propeller is most effective on + platforms supporting advanced PMU features like LBR on Intel + machines. This step is the same as profiling the kernel for AutoFDO + (the exact perf parameters can be different). + +#. Propeller profile generation: Perf output file is converted to a + pair of Propeller profiles via an offline tool. + +#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized + binary as you would normally do, but with a compile-time / + link-time flag to pick up the Propeller compile time and link time + profiles. This build step uses 3 profiles - the AutoFDO profile, + the Propeller compile-time profile and the Propeller link-time + profile. + +#. Deployment: The optimized kernel binary is deployed and used + in production environments, providing improved performance + and reduced latency. + +Preparation +=========== + +Configure the kernel with:: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + +Customization +============= + +The default CONFIG_PROPELLER_CLANG setting covers kernel space objects +for Propeller builds. One can, however, enable or disable Propeller build +for individual files and directories by adding a line similar to the +following to the respective kernel Makefile: + +- For enabling a single file (e.g. foo.o):: + + PROPELLER_PROFILE_foo.o := y + +- For enabling all files in one directory:: + + PROPELLER_PROFILE := y + +- For disabling one file:: + + PROPELLER_PROFILE_foo.o := n + +- For disabling all files in one directory:: + + PROPELLER__PROFILE := n + + +Workflow +======== + +Here is an example workflow for building an AutoFDO+Propeller kernel: + +1) Assuming an AutoFDO profile is already collected following + instructions in the AutoFDO document, build the kernel on the host + machine, with AutoFDO and Propeller build configs :: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and :: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> + +2) Install the kernel on the test machine. + +3) Run the load tests. The '-c' option in perf specifies the sample + event period. We suggest using a suitable prime number, like 500009, + for this purpose. + + - For Intel platforms:: + + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + - For AMD platforms:: + + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + Note you can repeat the above steps to collect multiple <perf_file>s. + +4) (Optional) Download the raw perf file(s) to the host machine. + +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to + generate Propeller profile. :: + + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> + --format=propeller --propeller_output_module_name + --out=<propeller_profile_prefix>_cc_profile.txt + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + + "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". + + This command generates a pair of Propeller profiles: + "<propeller_profile_prefix>_cc_profile.txt" and + "<propeller_profile_prefix>_ld_profile.txt". + + If there are more than 1 perf_file collected in the previous step, + you can create a temp list file "<perf_file_list>" with each line + containing one perf file name and run:: + + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> + --format=propeller --propeller_output_module_name + --out=<propeller_profile_prefix>_cc_profile.txt + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + +6) Rebuild the kernel using the AutoFDO and Propeller + profiles. :: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and :: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> diff --git a/MAINTAINERS b/MAINTAINERS index d6ea49433747a..42e3af0791e15 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18449,6 +18449,13 @@ S: Maintained F: include/linux/psi* F: kernel/sched/psi.c +PROPELLER BUILD +M: Rong Xu <xur@google.com> +M: Han Shen <shenhan@google.com> +S: Supported +F: Documentation/dev-tools/propeller.rst +F: scripts/Makefile.propeller + PRINTK M: Petr Mladek <pmladek@suse.com> R: Steven Rostedt <rostedt@goodmis.org> diff --git a/Makefile b/Makefile index b89d87b5dca79..b2830b27e1a7f 100644 --- a/Makefile +++ b/Makefile @@ -1019,6 +1019,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan include-$(CONFIG_KCOV) += scripts/Makefile.kcov include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo +include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins include $(addprefix $(srctree)/, $(include-y)) diff --git a/arch/Kconfig b/arch/Kconfig index 8dca3b5e6ef53..00551f340dbe3 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -831,6 +831,25 @@ config AUTOFDO_CLANG If unsure, say N. +config ARCH_SUPPORTS_PROPELLER_CLANG + bool + +config PROPELLER_CLANG + bool "Enable Clang's Propeller build" + depends on ARCH_SUPPORTS_PROPELLER_CLANG + depends on CC_IS_CLANG && CLANG_VERSION >= 190000 + help + This option enables Clang’s Propeller build. When the Propeller + profiles is specified in variable CLANG_PROPELLER_PROFILE_PREFIX + during the build process, Clang uses the profiles to optimize + the kernel. + + If no profile is specified, Propeller options are still passed + to Clang to facilitate the collection of perf data for creating + the Propeller profiles in subsequent builds. + + If unsure, say N. + config ARCH_SUPPORTS_CFI_CLANG bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9dc87661fb373..89b8fc452a7cf 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -127,6 +127,7 @@ config X86 select ARCH_SUPPORTS_LTO_CLANG_THIN select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_AUTOFDO_CLANG + select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64 select ARCH_USE_MEMTEST diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index b8c5741d2fb48..cf22081601ed6 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -443,6 +443,10 @@ SECTIONS STABS_DEBUG DWARF_DEBUG +#ifdef CONFIG_PROPELLER_CLANG + .llvm_bb_addr_map : { *(.llvm_bb_addr_map) } +#endif + ELF_DETAILS DISCARDS diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 8a0bb3946cf05..c995474e4c649 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -95,14 +95,14 @@ * With LTO_CLANG, the linker also splits sections by default, so we need * these macros to combine the sections during the final link. * - * With AUTOFDO_CLANG, by default, the linker splits text sections and - * regroups functions into subsections. + * With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits + * text sections and regroups functions into subsections. * * RODATA_MAIN is not used because existing code already defines .rodata.x * sections to be brought in with rodata. */ #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \ -defined(CONFIG_AUTOFDO_CLANG) +defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* #else #define TEXT_MAIN .text diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 2d0942c1a0277..e7859ad90224a 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -201,6 +201,16 @@ _c_flags += $(if $(patsubst n%,, \ $(CFLAGS_AUTOFDO_CLANG)) endif +# +# Enable Propeller build flags except some files or directories we don't want to +# enable (depends on variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE). +# +ifdef CONFIG_PROPELLER_CLANG +_c_flags += $(if $(patsubst n%,, \ + $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \ + $(CFLAGS_PROPELLER_CLANG)) +endif + # $(src) for including checkin headers from generated source files # $(obj) for including generated headers from checkin source files ifeq ($(KBUILD_EXTMOD),) diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller new file mode 100644 index 0000000000000..344190717e471 --- /dev/null +++ b/scripts/Makefile.propeller @@ -0,0 +1,28 @@ +# SPDX-License-Identifier: GPL-2.0 + +# Enable available and selected Clang Propeller features. +ifdef CLANG_PROPELLER_PROFILE_PREFIX + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections + KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering +else + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels +endif + +# Propeller requires debug information to embed module names in the profiles. +# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO, +# as the option should already be set. +ifndef CONFIG_DEBUG_INFO + ifndef CONFIG_AUTOFDO_CLANG + CFLAGS_PROPELLER_CLANG += -gmlt + endif +endif + +ifdef CONFIG_LTO_CLANG_THIN + ifdef CLANG_PROPELLER_PROFILE_PREFIX + KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt + else + KBUILD_LDFLAGS += --lto-basic-block-sections=labels + endif +endif + +export CFLAGS_PROPELLER_CLANG diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 4c5229991e1e0..05a0fb4a3d1a0 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -4558,6 +4558,7 @@ static int validate_ibt(struct objtool_file *file) !strcmp(sec->name, "__mcount_loc") || !strcmp(sec->name, ".kcfi_traps") || !strcmp(sec->name, ".llvm.call-graph-profile") || + !strcmp(sec->name, ".llvm_bb_addr_map") || strstr(sec->name, "__patchable_function_entries")) continue;