mbox series

[GIT,PULL] arm64 updates for 6.2

Message ID 20221209112500.GA3116@willie-the-truck
State New
Headers show
Series [GIT,PULL] arm64 updates for 6.2 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-upstream

Message

Will Deacon Dec. 9, 2022, 11:25 a.m. UTC
Hi Linus,

Please pull these arm64 updates for the 6.2 merge window.

There's the usual summary in the tag, with the highlights this time being
support for dynamically enabling/disabling Clang's Shadow Call Stack at
boot and a long-awaited optimisation to the way in which we handle the
SVE register state on system call entry to avoid taking unnecessary traps
from userspace.

A few small things to note:

  * for-next/sysregs is shared with the KVM tree, since it conflicts
    with some rework of the PMU handling code in the hypervisor.

  * We have some refactoring changes to the core ftrace code reworking
    FTRACE_WITH_REGS and fixing up PowerPC, S390 and x86 (acked by Steve
    and Masami). We build on top of this to enable FTRACE_WITH_ARGS on
    arm64.

  * We dropped for-next/uaccess fairly late in the cycle after Syzkaller
    ran into an issue with the fault handling in copy_to_user(). Hopefully
    we'll bring it back next time once we've got deterministic testing
    coverage of all the exception fixups.

It's all been sitting happily in -next for a while now.

Cheers,

Will

--->8

The following changes since commit f0c4d9fc9cc9462659728d168387191387e903cc:

  Linux 6.1-rc4 (2022-11-06 15:07:11 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-upstream

for you to fetch changes up to 5f4c374760b031f06c69c2fdad1b0e981a1ad42f:

  Merge branch 'for-next/undef-traps' into for-next/core (2022-12-06 11:34:25 +0000)

----------------------------------------------------------------
arm64 updates for 6.2

ACPI:
	* Enable FPDT support for boot-time profiling
	* Fix CPU PMU probing to work better with PREEMPT_RT
	* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
	* APMT support for probing Arm CoreSight PMU devices

CPU features:
	* Advertise new SVE instructions (v2.1)
	* Advertise range prefetch instruction
	* Advertise CSSC ("Common Short Sequence Compression") scalar
	  instructions, adding things like min, max, abs, popcount
	* Enable DIT (Data Independent Timing) when running in the kernel
	* More conversion of system register fields over to the generated
	  header

CPU misfeatures:
	* Workaround for Cortex-A715 erratum #2645198

Dynamic SCS:
	* Support for dynamic shadow call stacks to allow switching at
	  runtime between Clang's SCS implementation and the CPU's
	  pointer authentication feature when it is supported (complete
	  with scary DWARF parser!)

Tracing and debug:
	* Remove static ftrace in favour of, err, dynamic ftrace!
	* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
	  ftrace and existing arch code
	* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
	  the old FTRACE_WITH_REGS
	* Extend 'crashkernel=' parameter with default value and fallback
	  to placement above 4G physical if initial (low) allocation
	  fails

SVE:
	* Optimisation to avoid disabling SVE unconditionally on syscall
	  entry and just zeroing the non-shared state on return instead

Exceptions:
	* Rework of undefined instruction handling to avoid serialisation
	  on global lock (this includes emulation of user accesses to the
	  ID registers)

Perf and PMU:
	* Support for TLP filters in Hisilicon's PCIe PMU device
	* Support for the DDR PMU present in Amlogic Meson G12 SoCs
	* Support for the terribly-named "CoreSight PMU" architecture
	  from Arm (and Nvidia's implementation of said architecture)

Misc:
	* Tighten up our boot protocol for systems with memory above
          52 bits physical
	* Const-ify static keys to satisty jump label asm constraints
	* Trivial FFA driver cleanups in preparation for v1.1 support
	* Export the kernel_neon_* APIs as GPL symbols
	* Harden our instruction generation routines against
	  instrumentation
	* A bunch of robustness improvements to our arch-specific selftests
	* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)

----------------------------------------------------------------
Anshuman Khandual (9):
      arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS
      arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
      arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
      arm64/mm: Drop idmap_pg_end[] declaration
      arm64/mm: Drop unused restore_ttbr1
      arm64: Add Cortex-715 CPU part definition
      arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
      arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
      arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()

Ard Biesheuvel (5):
      arm64: Enable data independent timing (DIT) in the kernel
      arm64: unwind: add asynchronous unwind tables to kernel and modules
      scs: add support for dynamic shadow call stacks
      arm64: implement dynamic shadow call stack for Clang
      arm64: booting: Require placement within 48-bit addressable memory

Bagas Sanjaya (1):
      Documentation: perf: Indent filter options list of hisi-pcie-pmu

Besar Wicaksono (6):
      ACPI: ARM Performance Monitoring Unit Table (APMT) initial support
      ACPI: APMT: Fix kerneldoc and indentation
      perf: arm_cspmu: Add support for ARM CoreSight PMU driver
      perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
      perf: arm_cspmu: Fix build failure on x86_64
      perf: arm_cspmu: Fix module cyclic dependency

James Morse (38):
      arm64/sysreg: Standardise naming for ID_MMFR0_EL1
      arm64/sysreg: Standardise naming for ID_MMFR4_EL1
      arm64/sysreg: Standardise naming for ID_MMFR5_EL1
      arm64/sysreg: Standardise naming for ID_ISAR0_EL1
      arm64/sysreg: Standardise naming for ID_ISAR4_EL1
      arm64/sysreg: Standardise naming for ID_ISAR5_EL1
      arm64/sysreg: Standardise naming for ID_ISAR6_EL1
      arm64/sysreg: Standardise naming for ID_PFR0_EL1
      arm64/sysreg: Standardise naming for ID_PFR1_EL1
      arm64/sysreg: Standardise naming for ID_PFR2_EL1
      arm64/sysreg: Standardise naming for ID_DFR0_EL1
      arm64/sysreg: Standardise naming for ID_DFR1_EL1
      arm64/sysreg: Standardise naming for MVFR0_EL1
      arm64/sysreg: Standardise naming for MVFR1_EL1
      arm64/sysreg: Standardise naming for MVFR2_EL1
      arm64/sysreg: Extend the maximum width of a register and symbol name
      arm64/sysreg: Convert ID_MMFR0_EL1 to automatic generation
      arm64/sysreg: Convert ID_MMFR1_EL1 to automatic generation
      arm64/sysreg: Convert ID_MMFR2_EL1 to automatic generation
      arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation
      arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation
      arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation
      arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
      arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
      arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
      arm64/sysreg: Convert MVFR0_EL1 to automatic generation
      arm64/sysreg: Convert MVFR1_EL1 to automatic generation
      arm64/sysreg: Convert MVFR2_EL1 to automatic generation
      arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
      arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
      arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
      arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation

Jeremy Linton (1):
      ACPI: Enable FPDT on arm64

Jiapeng Chong (1):
      perf/amlogic: Remove unused header inclusions of  <linux/version.h>

Jisheng Zhang (3):
      arm64: jump_label: mark arguments as const to satisfy asm constraints
      arm64: alternative: constify alternative_has_feature_* argument
      arm64: alternatives: add __init/__initconst to some functions/variables

Jiucheng Xu (4):
      perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
      docs/perf: Add documentation for the Amlogic G12 DDR PMU
      dt-binding: perf: Add Amlogic DDR PMU
      perf/amlogic: Fix build error for x86_64 allmodconfig

Kang Minchul (1):
      kselftest/arm64: fix array_size.cocci warning

Mark Brown (29):
      arm64/asm: Remove unused enable_da macro
      arm64/booting: Add missing colon to FA64 entry
      kselftest/arm64: Check that all children are producing output in fp-stress
      kselftest/arm64: Provide progress messages when signalling children
      kselftest/arm64: Remove validation of extra_context from TODO
      kselftest/arm64: Print ASCII version of unknown signal frame magic values
      arm64/fpsimd: Make kernel_neon_ API _GPL
      arm64/hwcap: Add support for FEAT_CSSC
      kselftest/arm64: Add FEAT_CSSC to the hwcap selftest
      arm64/hwcap: Add support for FEAT_RPRFM
      kselftest/arm64: Add FEAT_RPRFM to the hwcap test
      arm64/hwcap: Add support for SVE 2.1
      kselftest/arm64: Add SVE 2.1 to hwcap test
      arm64/signal: Document our convention for choosing magic numbers
      kselftest/arm64: Use preferred form for predicate load/stores
      arm64/kpti: Move DAIF masking to C code
      arm64/asm: Remove unused assembler DAIF save/restore macros
      kselftest/arm64: Set test names prior to starting children
      KVM: arm64: Discard any SVE state when entering KVM guests
      arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE
      arm64/fpsimd: Have KVM explicitly say which FP registers to save
      arm64/fpsimd: Stop using TIF_SVE to manage register saving in KVM
      arm64/fpsimd: Load FP state based on recorded data type
      arm64/fpsimd: SME no longer requires SVE register state
      arm64/sve: Leave SVE enabled on syscall if we don't context switch
      arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()
      kselftest/arm64: Hold fp-stress children until they're all spawned
      kselftest/arm64: Don't drain output while spawning children
      kselftest/arm64: Allow epoll_wait() to return more than one result

Mark Rutland (28):
      arm_pmu: acpi: factor out PMU<->CPU association
      arm_pmu: factor out PMU matching
      arm_pmu: rework ACPI probing
      arm_pmu: acpi: handle allocation failure
      arm64: atomics: lse: remove stale dependency on JUMP_LABEL
      arm64: make is_ttbrX_addr() noinstr-safe
      arm64: insn: remove aarch64_insn_gen_prefetch()
      arm64: insn: always inline predicates
      arm64: insn: simplify insn group identification
      arm64: insn: always inline hint generation
      arm64: mm: kfence: only handle translation faults
      arm64: allow kprobes on EL0 handlers
      arm64: split EL0/EL1 UNDEF handlers
      arm64: factor out EL1 SSBS emulation hook
      arm64: factor insn read out of call_undef_hook()
      arm64: rework EL0 MRS emulation
      arm64: armv8_deprecated: fold ops into insn_emulation
      arm64: armv8_deprecated move emulation functions
      arm64: armv8_deprecated: move aarch32 helper earlier
      arm64: armv8_deprecated: rework deprected instruction handling
      ftrace: pass fregs to arch_ftrace_set_direct_caller()
      ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
      ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
      ftrace: arm64: move from REGS to ARGS
      arm64: alternatives: make apply_alternatives_vdso() static
      arm64: remove current_top_of_stack()
      arm64: move on_thread_stack() to <asm/stacktrace.h>
      ftrace: arm64: remove static ftrace

Masahiro Yamada (1):
      arm64: remove special treatment for the link order of head.o

Masami Hiramatsu (Google) (3):
      arm64: Prohibit instrumentation on arch_stack_walk()
      arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
      arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK

Mukesh Ojha (1):
      arm64: entry: Fix typo

Ren Zhijie (1):
      arm64: armv8_deprecated: fix unused-function error

Robin Murphy (1):
      ACPI/IORT: Update SMMUv3 DeviceID support

Shang XiaoJing (2):
      perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init()
      perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init()

Shaokun Zhang (1):
      MAINTAINERS: Update HiSilicon PMU maintainers

Usama Arif (1):
      arm64: paravirt: remove conduit check in has_pv_steal_clock

Will Deacon (25):
      perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s
      Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"
      firmware: arm_ffa: Move constants to header file
      firmware: arm_ffa: Move comment before the field it is documenting
      arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
      Merge branch 'for-next/acpi' into for-next/core
      Merge branch 'for-next/asm-const' into for-next/core
      Merge branch 'for-next/cpufeature' into for-next/core
      Merge branch 'for-next/dynamic-scs' into for-next/core
      Merge branch 'for-next/errata' into for-next/core
      Merge branch 'for-next/ffa' into for-next/core
      Merge branch 'for-next/fpsimd' into for-next/core
      Merge branch 'for-next/ftrace' into for-next/core
      Merge branch 'for-next/insn' into for-next/core
      Merge branch 'for-next/kbuild' into for-next/core
      Merge branch 'for-next/kdump' into for-next/core
      Merge branch 'for-next/kprobes' into for-next/core
      Merge branch 'for-next/mm' into for-next/core
      Merge branch 'for-next/perf' into for-next/core
      Merge branch 'for-next/selftests' into for-next/core
      Merge branch 'for-next/stacks' into for-next/core
      Merge branch 'for-next/sve-state' into for-next/core
      Merge branch 'for-next/sysregs' into for-next/core
      Merge branch 'for-next/trivial' into for-next/core
      Merge branch 'for-next/undef-traps' into for-next/core

Yicong Yang (3):
      drivers/perf: hisi: Fix some event id for hisi-pcie-pmu
      docs: perf: Fix PMU instance name of hisi-pcie-pmu
      drivers/perf: hisi: Add TLP filter support

Yuan Can (2):
      perf: arm_dsu: Fix hotplug callback leak in dsu_pmu_init()
      drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init()

Zhen Lei (2):
      arm64: kdump: Provide default size when crashkernel=Y,low is not specified
      arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones

junhua huang (1):
      arm64:uprobe fix the uprobe SWBP_INSN in big-endian

wangkailong@jari.cn (1):
      kselftest/arm64: fix array_size.cocci warning

 Documentation/admin-guide/kernel-parameters.txt    |   15 +-
 Documentation/admin-guide/perf/hisi-pcie-pmu.rst   |  112 +-
 Documentation/admin-guide/perf/index.rst           |    2 +
 Documentation/admin-guide/perf/meson-ddr-pmu.rst   |   70 ++
 Documentation/admin-guide/perf/nvidia-pmu.rst      |  299 +++++
 Documentation/arm64/acpi_object_usage.rst          |    2 +-
 Documentation/arm64/booting.rst                    |    7 +-
 Documentation/arm64/elf_hwcaps.rst                 |    9 +
 Documentation/arm64/silicon-errata.rst             |    2 +
 Documentation/arm64/sve.rst                        |    1 +
 .../bindings/perf/amlogic,g12-ddr-pmu.yaml         |   54 +
 MAINTAINERS                                        |   12 +-
 Makefile                                           |    2 +
 arch/Kconfig                                       |    7 +
 arch/arm64/Kconfig                                 |   49 +-
 arch/arm64/Makefile                                |   17 +-
 arch/arm64/include/asm/alternative-macros.h        |    4 +-
 arch/arm64/include/asm/assembler.h                 |   33 +-
 arch/arm64/include/asm/cpufeature.h                |    3 +-
 arch/arm64/include/asm/cputype.h                   |    2 +
 arch/arm64/include/asm/exception.h                 |    7 +-
 arch/arm64/include/asm/fpsimd.h                    |   17 +-
 arch/arm64/include/asm/ftrace.h                    |   72 +-
 arch/arm64/include/asm/hugetlb.h                   |    9 +
 arch/arm64/include/asm/hwcap.h                     |    3 +
 arch/arm64/include/asm/insn.h                      |  156 ++-
 arch/arm64/include/asm/jump_label.h                |    8 +-
 arch/arm64/include/asm/kernel-pgtable.h            |   11 +-
 arch/arm64/include/asm/kvm_host.h                  |   12 +-
 arch/arm64/include/asm/lse.h                       |    1 -
 arch/arm64/include/asm/mmu_context.h               |   10 +
 arch/arm64/include/asm/module.lds.h                |    8 +
 arch/arm64/include/asm/pgtable-hwdef.h             |    1 +
 arch/arm64/include/asm/pgtable.h                   |   14 +-
 arch/arm64/include/asm/processor.h                 |   24 +-
 arch/arm64/include/asm/scs.h                       |   49 +
 arch/arm64/include/asm/spectre.h                   |    2 +
 arch/arm64/include/asm/stacktrace.h                |    2 +
 arch/arm64/include/asm/sysreg.h                    |  150 +--
 arch/arm64/include/asm/traps.h                     |   19 +-
 arch/arm64/include/asm/uprobes.h                   |    2 +-
 arch/arm64/include/uapi/asm/hwcap.h                |    3 +
 arch/arm64/include/uapi/asm/sigcontext.h           |    4 +
 arch/arm64/kernel/Makefile                         |    2 +
 arch/arm64/kernel/alternative.c                    |    6 +-
 arch/arm64/kernel/armv8_deprecated.c               |  567 +++++----
 arch/arm64/kernel/asm-offsets.c                    |   13 +
 arch/arm64/kernel/cpu_errata.c                     |    7 +
 arch/arm64/kernel/cpufeature.c                     |  253 ++--
 arch/arm64/kernel/cpuinfo.c                        |    3 +
 arch/arm64/kernel/entry-common.c                   |   24 +-
 arch/arm64/kernel/entry-ftrace.S                   |  156 +--
 arch/arm64/kernel/entry.S                          |    3 +
 arch/arm64/kernel/fpsimd.c                         |  169 ++-
 arch/arm64/kernel/ftrace.c                         |   87 +-
 arch/arm64/kernel/head.S                           |    3 +
 arch/arm64/kernel/irq.c                            |   11 +-
 arch/arm64/kernel/module.c                         |   11 +-
 arch/arm64/kernel/paravirt.c                       |    4 -
 arch/arm64/kernel/patch-scs.c                      |  257 ++++
 arch/arm64/kernel/perf_event.c                     |    3 +-
 arch/arm64/kernel/pi/Makefile                      |    1 +
 arch/arm64/kernel/probes/decode-insn.c             |    2 +-
 arch/arm64/kernel/probes/kprobes.c                 |   86 +-
 arch/arm64/kernel/process.c                        |    2 +
 arch/arm64/kernel/proton-pack.c                    |   26 +-
 arch/arm64/kernel/ptrace.c                         |    5 +-
 arch/arm64/kernel/sdei.c                           |    2 +-
 arch/arm64/kernel/setup.c                          |    4 +
 arch/arm64/kernel/signal.c                         |    7 +-
 arch/arm64/kernel/stacktrace.c                     |   10 +-
 arch/arm64/kernel/suspend.c                        |    2 +
 arch/arm64/kernel/syscall.c                        |   19 +-
 arch/arm64/kernel/traps.c                          |   93 +-
 arch/arm64/kernel/vmlinux.lds.S                    |   13 +
 arch/arm64/kvm/fpsimd.c                            |   26 +-
 arch/arm64/kvm/hyp/nvhe/Makefile                   |    1 +
 arch/arm64/kvm/sys_regs.c                          |    4 +-
 arch/arm64/lib/insn.c                              |  165 ---
 arch/arm64/lib/mte.S                               |    2 +-
 arch/arm64/mm/fault.c                              |    8 +-
 arch/arm64/mm/hugetlbpage.c                        |   21 +
 arch/arm64/mm/init.c                               |   25 +-
 arch/arm64/mm/mmu.c                                |   23 +-
 arch/arm64/mm/proc.S                               |    4 -
 arch/arm64/tools/cpucaps                           |    2 +
 arch/arm64/tools/gen-sysreg.awk                    |    2 +-
 arch/arm64/tools/sysreg                            |  766 +++++++++++-
 arch/powerpc/include/asm/ftrace.h                  |   24 +-
 arch/s390/include/asm/ftrace.h                     |   29 +-
 arch/x86/include/asm/ftrace.h                      |   49 +-
 drivers/acpi/Kconfig                               |    2 +-
 drivers/acpi/arm64/Kconfig                         |    3 +
 drivers/acpi/arm64/Makefile                        |    1 +
 drivers/acpi/arm64/apmt.c                          |  178 +++
 drivers/acpi/arm64/iort.c                          |   16 +-
 drivers/acpi/bus.c                                 |    2 +
 drivers/firmware/arm_ffa/driver.c                  |  101 +-
 drivers/firmware/efi/libstub/Makefile              |    1 +
 drivers/perf/Kconfig                               |    4 +
 drivers/perf/Makefile                              |    2 +
 drivers/perf/amlogic/Kconfig                       |   10 +
 drivers/perf/amlogic/Makefile                      |    5 +
 drivers/perf/amlogic/meson_ddr_pmu_core.c          |  561 +++++++++
 drivers/perf/amlogic/meson_g12_ddr_pmu.c           |  394 ++++++
 drivers/perf/arm_cspmu/Kconfig                     |   13 +
 drivers/perf/arm_cspmu/Makefile                    |    6 +
 drivers/perf/arm_cspmu/arm_cspmu.c                 | 1303 ++++++++++++++++++++
 drivers/perf/arm_cspmu/arm_cspmu.h                 |  151 +++
 drivers/perf/arm_cspmu/nvidia_cspmu.c              |  400 ++++++
 drivers/perf/arm_cspmu/nvidia_cspmu.h              |   17 +
 drivers/perf/arm_dmc620_pmu.c                      |    8 +-
 drivers/perf/arm_dsu_pmu.c                         |    6 +-
 drivers/perf/arm_pmu.c                             |   20 +-
 drivers/perf/arm_pmu_acpi.c                        |  114 +-
 drivers/perf/arm_smmuv3_pmu.c                      |    8 +-
 drivers/perf/hisilicon/hisi_pcie_pmu.c             |   22 +-
 drivers/perf/marvell_cn10k_tad_pmu.c               |    6 +-
 include/asm-generic/vmlinux.lds.h                  |    9 +-
 include/linux/acpi_apmt.h                          |   19 +
 include/linux/arm_ffa.h                            |   85 +-
 include/linux/ftrace.h                             |   47 +-
 include/linux/perf/arm_pmu.h                       |    1 -
 include/linux/scs.h                                |   18 +
 include/soc/amlogic/meson_ddr_pmu.h                |   66 +
 kernel/livepatch/patch.c                           |    2 +-
 kernel/scs.c                                       |   14 +-
 kernel/trace/Kconfig                               |    6 +-
 kernel/trace/ftrace.c                              |    3 +-
 scripts/head-object-list.txt                       |    1 -
 scripts/module.lds.S                               |    6 +
 tools/testing/selftests/arm64/abi/hwcap.c          |   32 +
 .../testing/selftests/arm64/abi/syscall-abi-asm.S  |    4 +-
 tools/testing/selftests/arm64/fp/fp-stress.c       |  120 +-
 .../selftests/arm64/mte/check_buffer_fill.c        |   12 +-
 .../selftests/arm64/mte/check_mmap_options.c       |    9 +-
 .../testing/selftests/arm64/signal/testcases/TODO  |    1 -
 .../selftests/arm64/signal/testcases/testcases.c   |   21 +-
 138 files changed, 6535 insertions(+), 1583 deletions(-)
 create mode 100644 Documentation/admin-guide/perf/meson-ddr-pmu.rst
 create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
 create mode 100644 Documentation/devicetree/bindings/perf/amlogic,g12-ddr-pmu.yaml
 create mode 100644 arch/arm64/kernel/patch-scs.c
 create mode 100644 drivers/acpi/arm64/apmt.c
 create mode 100644 drivers/perf/amlogic/Kconfig
 create mode 100644 drivers/perf/amlogic/Makefile
 create mode 100644 drivers/perf/amlogic/meson_ddr_pmu_core.c
 create mode 100644 drivers/perf/amlogic/meson_g12_ddr_pmu.c
 create mode 100644 drivers/perf/arm_cspmu/Kconfig
 create mode 100644 drivers/perf/arm_cspmu/Makefile
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
 create mode 100644 include/linux/acpi_apmt.h
 create mode 100644 include/soc/amlogic/meson_ddr_pmu.h

Comments

Linus Torvalds Dec. 12, 2022, 6:05 p.m. UTC | #1
On Fri, Dec 9, 2022 at 3:25 AM Will Deacon <will@kernel.org> wrote:
>
> Dynamic SCS:
>         * Support for dynamic shadow call stacks to allow switching at
>           runtime between Clang's SCS implementation and the CPU's
>           pointer authentication feature when it is supported (complete
>           with scary DWARF parser!)

I've pulled this thing, but this part makes me nervous. There's some
bad history with debug information not being 100% reliable probably
simply because it gets very little correctness testing.

It might be worth thinking about at least verifying the information
using something like objtool, so that you at least catch problem cases
at *build* time rather than runtime.

For example, that whole

    default:
        pr_err("unhandled opcode: %02x in FDE frame %lx\n",
opcode[-1], (uintptr_t)frame);
        return -ENOEXEC;

really makes me go "this should have been verified at build time, it's
much too late to notice now that you don't understand the dwarf data".

Hmm?

                    Linus
pr-tracker-bot@kernel.org Dec. 12, 2022, 6:07 p.m. UTC | #2
The pull request you sent on Fri, 9 Dec 2022 11:25:01 +0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-upstream

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/06cff4a58e7dfa018c5f8a6ebdc3ff12745e0bae

Thank you!
Will Deacon Dec. 13, 2022, 12:11 p.m. UTC | #3
Hi Linus,

[+Ard]

On Mon, Dec 12, 2022 at 10:05:07AM -0800, Linus Torvalds wrote:
> On Fri, Dec 9, 2022 at 3:25 AM Will Deacon <will@kernel.org> wrote:
> >
> > Dynamic SCS:
> >         * Support for dynamic shadow call stacks to allow switching at
> >           runtime between Clang's SCS implementation and the CPU's
> >           pointer authentication feature when it is supported (complete
> >           with scary DWARF parser!)
> 
> I've pulled this thing, but this part makes me nervous. There's some
> bad history with debug information not being 100% reliable probably
> simply because it gets very little correctness testing.

Hey, I did use the word "scary"! This is, at least, very easy to back
out (it's effectively an optimisation) if the DWARF info ends up being
too unreliable and causes issues in practice. We're also only looking
at .eh_frame here, which should hopefully get a lot more correctness
testing when compared to the .debug sections due to exception unwinding.

> It might be worth thinking about at least verifying the information
> using something like objtool, so that you at least catch problem cases
> at *build* time rather than runtime.

Checking that the DWARF data looks sensible at build time isn't a bad
idea, but see below as I think we can probably still produce a functional
kernel Image in this case.

> For example, that whole
> 
>     default:
>         pr_err("unhandled opcode: %02x in FDE frame %lx\n",
> opcode[-1], (uintptr_t)frame);
>         return -ENOEXEC;
> 
> really makes me go "this should have been verified at build time, it's
> much too late to notice now that you don't understand the dwarf data".

This isn't actually as bad as it looks -- the patching operation here
only kicks in on CPUs which do not implement the pointer authentication
instructions (i.e. where the CPU executes these as NOPs). Therefore, if
patching bails out half way due to the "unhandled opcode" above, we
should be ok, albeit missing some SCS coverage. I say "should" because
if we fail within a frame after patching in the SCS "push" but before
patching in the "pop", then we'd end up with a corrupt SCS pointer.

Ard -- do you think we could tweak the patching so that we patch the push
and the pop together (e.g. by tracking the two locations on a per-frame
basis and postponing the text poking until just before we return from
scs_handle_fde_frame())?

Will
Ard Biesheuvel Dec. 13, 2022, 12:36 p.m. UTC | #4
l

On Tue, 13 Dec 2022 at 13:11, Will Deacon <will@kernel.org> wrote:
>
> Hi Linus,
>
> [+Ard]
>
> On Mon, Dec 12, 2022 at 10:05:07AM -0800, Linus Torvalds wrote:
> > On Fri, Dec 9, 2022 at 3:25 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > Dynamic SCS:
> > >         * Support for dynamic shadow call stacks to allow switching at
> > >           runtime between Clang's SCS implementation and the CPU's
> > >           pointer authentication feature when it is supported (complete
> > >           with scary DWARF parser!)
> >
> > I've pulled this thing, but this part makes me nervous. There's some
> > bad history with debug information not being 100% reliable probably
> > simply because it gets very little correctness testing.
>
> Hey, I did use the word "scary"! This is, at least, very easy to back
> out (it's effectively an optimisation) if the DWARF info ends up being
> too unreliable and causes issues in practice. We're also only looking
> at .eh_frame here, which should hopefully get a lot more correctness
> testing when compared to the .debug sections due to exception unwinding.
>

Indeed. And this is Clang 15+ at the moment, for precisely this reason.

> > It might be worth thinking about at least verifying the information
> > using something like objtool, so that you at least catch problem cases
> > at *build* time rather than runtime.
>
> Checking that the DWARF data looks sensible at build time isn't a bad
> idea, but see below as I think we can probably still produce a functional
> kernel Image in this case.
>
> > For example, that whole
> >
> >     default:
> >         pr_err("unhandled opcode: %02x in FDE frame %lx\n",
> > opcode[-1], (uintptr_t)frame);
> >         return -ENOEXEC;
> >
> > really makes me go "this should have been verified at build time, it's
> > much too late to notice now that you don't understand the dwarf data".
>
> This isn't actually as bad as it looks -- the patching operation here
> only kicks in on CPUs which do not implement the pointer authentication
> instructions (i.e. where the CPU executes these as NOPs). Therefore, if
> patching bails out half way due to the "unhandled opcode" above, we
> should be ok, albeit missing some SCS coverage.

Indeed.

> I say "should" because
> if we fail within a frame after patching in the SCS "push" but before
> patching in the "pop", then we'd end up with a corrupt SCS pointer.
>
> Ard -- do you think we could tweak the patching so that we patch the push
> and the pop together (e.g. by tracking the two locations on a per-frame
> basis and postponing the text poking until just before we return from
> scs_handle_fde_frame())?
>

The push and the pop are not necessarily balanced (there may be more
than one pop for each push), and the opcode we look for
(DW_CFA_negate_ra_state) may occur in places which are not actually a
pop, so tracking these is not as straight-forward as this.

What we could do is track the push and the first pop on a first pass,
and if we don't encounter any unexpected opcodes, patch the push and
do a second pass starting from the first pop. Or just simply run it
twice and do no patching the first time around (the DWARF frames are
not very big)
Will Deacon Dec. 13, 2022, 12:52 p.m. UTC | #5
On Tue, Dec 13, 2022 at 01:36:09PM +0100, Ard Biesheuvel wrote:
> On Tue, 13 Dec 2022 at 13:11, Will Deacon <will@kernel.org> wrote:
> > Ard -- do you think we could tweak the patching so that we patch the push
> > and the pop together (e.g. by tracking the two locations on a per-frame
> > basis and postponing the text poking until just before we return from
> > scs_handle_fde_frame())?
> >
> 
> The push and the pop are not necessarily balanced (there may be more
> than one pop for each push), and the opcode we look for
> (DW_CFA_negate_ra_state) may occur in places which are not actually a
> pop, so tracking these is not as straight-forward as this.

Duh, yes, of course. You only _execute_ one of the pops for a given run
through the function, but there could be numerous return points. So my
idea doesn't work at all :)

> What we could do is track the push and the first pop on a first pass,
> and if we don't encounter any unexpected opcodes, patch the push and
> do a second pass starting from the first pop. Or just simply run it
> twice and do no patching the first time around (the DWARF frames are
> not very big)

Doing a dry-run first sounds fairly easy to implement, so it would probably
be a good starting point. It also means that if anybody complains about the
overhead, then we can get them to work on doing it at build time instead!

Will