From patchwork Tue Sep 24 11:29:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Magali Lemes X-Patchwork-Id: 1988876 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XCd2L1TdDz1xsn for ; Tue, 24 Sep 2024 21:30:18 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1st3kA-0004A2-7w; Tue, 24 Sep 2024 11:30:10 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1st3k7-000498-PX for kernel-team@lists.ubuntu.com; Tue, 24 Sep 2024 11:30:07 +0000 Received: from mail-vs1-f70.google.com (mail-vs1-f70.google.com [209.85.217.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 5FA143F129 for ; Tue, 24 Sep 2024 11:30:07 +0000 (UTC) Received: by mail-vs1-f70.google.com with SMTP id ada2fe7eead31-49bca4196b2so4745392137.1 for ; Tue, 24 Sep 2024 04:30:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727177406; x=1727782206; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=65/qg8LTV7bgOweNADlX65pGpAIQau7LFv81LwUUqis=; b=IejXzkWqb3uZwlY/tE7dJDTdnRE9jVMRmOYgqoxVchvOy8aw7hTGSSzshVy35OF2J+ UiwoOrPt83SnRjz6H0tvda2yqV3tp7X8VHZkpYOqBplkTkbaE2a/ye38gXRIfnYqQTRw 9bPYdE7xZyAOhQxnwdunux2rGdBYcRirXeaIpzsWYB+Juyts1YpyJBEg7U7ssyxBYThA xd6jraPpfTuFTlxYyfsF6tUIMe33JnBdatTVnItsHlX64uwBoHvjOOkygMlkye5FNj7+ VlTdzD0QsliF2dmsKUbkZTcYkSxWDlK4Pb+hqsmLZOMpn3O0+zP91fQZbAYuBzRsXmDB fGcA== X-Gm-Message-State: AOJu0YyDg7bsjiXIlWxREG9S1AuOnFqv4Ajms6+uT6Dus7hKYAhMlhJm wbWJkWJId+jK3RjGiIaZYUDJfXRKhcpYUXWcZyH0I5T5uIXOOtJiXIXOh+gx+YZ8qq5u+uAbsY7 okpfgMaYYSqR/9UX5RejsE1OylhNeV7iObvX44EcDlkgMzSJ2djslmnpw5LUzSD+oRQn9PDOQe2 7E6DeSBatD4A== X-Received: by 2002:a05:6102:50aa:b0:49b:dca1:15c1 with SMTP id ada2fe7eead31-4a14e9aaaafmr1866705137.9.1727177405772; Tue, 24 Sep 2024 04:30:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGvHp9EDMbSaOLGLEyo7I0ZM/kaUwNMwIFBaj+5YZ0IdlvgTJmJLz2QqCcG1+EzmP4FgFBqlg== X-Received: by 2002:a05:6102:50aa:b0:49b:dca1:15c1 with SMTP id ada2fe7eead31-4a14e9aaaafmr1866693137.9.1727177405206; Tue, 24 Sep 2024 04:30:05 -0700 (PDT) Received: from localhost.localdomain ([2804:14c:14a:807c:645d:9d7:54bc:b0be]) by smtp.gmail.com with ESMTPSA id ada2fe7eead31-4a151860053sm678827137.24.2024.09.24.04.30.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 04:30:03 -0700 (PDT) From: Magali Lemes To: kernel-team@lists.ubuntu.com Subject: [SRU][jammy:linux-gcp][PATCH 1/1] x86/kaslr: Expose and use the end of the physical memory address space Date: Tue, 24 Sep 2024 08:29:32 -0300 Message-Id: <20240924112933.242923-3-magali.lemes@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240924112933.242923-1-magali.lemes@canonical.com> References: <20240924112933.242923-1-magali.lemes@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Thomas Gleixner BugLink: https://bugs.launchpad.net/bugs/2080563 iounmap() on x86 occasionally fails to unmap because the provided valid ioremap address is not below high_memory. It turned out that this happens due to KASLR. KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. MAX_PHYSMEM_BITS cannot be changed for that because the randomization does not align with address bit boundaries and there are other places which actually require to know the maximum number of address bits. All remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found to be correct. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. To prevent future hickups add a check into add_pages() to catch callers trying to add memory above PHYSMEM_END. Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions") Reported-by: Max Ramanouski Reported-by: Alistair Popple Signed-off-by: Thomas Gleixner Tested-By: Max Ramanouski Tested-by: Alistair Popple Reviewed-by: Dan Williams Reviewed-by: Alistair Popple Reviewed-by: Kees Cook Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx (backported from commit ea72ce5da22806d5713f3ffb39a6d5ae73841f93) [magalilemes: include the definition of `PHYSMEM_END` by hand. Also, since gfr_start() and gfr_continue() don't exist here yet due to missing upstream commit 14b80582c43e ("resource: Introduce alloc_free_mem_region()"), there is no need to patch `kernel/resource.c` as done in the original patch.] Signed-off-by: Magali Lemes --- arch/x86/include/asm/page_64.h | 1 + arch/x86/include/asm/pgtable_64_types.h | 4 ++++ arch/x86/mm/init_64.c | 4 ++++ arch/x86/mm/kaslr.c | 32 ++++++++++++++++++++----- include/linux/mm.h | 4 ++++ mm/memory_hotplug.c | 2 +- mm/sparse.c | 2 +- 7 files changed, 41 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 56891399fa2a..4aca7112c8e6 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -14,6 +14,7 @@ extern unsigned long phys_base; extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; +extern unsigned long physmem_end; static __always_inline unsigned long __phys_addr_nodebug(unsigned long x) { diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 91ac10654570..f60800dcff0e 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -139,6 +139,10 @@ extern unsigned int ptrs_per_p4d; # define VMEMMAP_START __VMEMMAP_BASE_L4 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ +#ifdef CONFIG_RANDOMIZE_MEMORY +# define PHYSMEM_END physmem_end +#endif + #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 200ad5ceeb43..07d372c73ea5 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -950,8 +950,12 @@ static void update_end_of_memory_vars(u64 start, u64 size) int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct mhp_params *params) { + unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1; int ret; + if (WARN_ON_ONCE(end > PHYSMEM_END)) + return -ERANGE; + ret = __add_pages(nid, start_pfn, nr_pages, params); WARN_ON_ONCE(ret); diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 37db264866b6..230f1dee4f09 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -47,13 +47,24 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE; */ static __initdata struct kaslr_memory_region { unsigned long *base; + unsigned long *end; unsigned long size_tb; } kaslr_regions[] = { - { &page_offset_base, 0 }, - { &vmalloc_base, 0 }, - { &vmemmap_base, 0 }, + { + .base = &page_offset_base, + .end = &physmem_end, + }, + { + .base = &vmalloc_base, + }, + { + .base = &vmemmap_base, + }, }; +/* The end of the possible address space for physical memory */ +unsigned long physmem_end __ro_after_init; + /* Get size in bytes used by the memory region */ static inline unsigned long get_padding(struct kaslr_memory_region *region) { @@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void) BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE); BUILD_BUG_ON(vaddr_end > __START_KERNEL_map); + /* Preset the end of the possible address space for physical memory */ + physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1); if (!kaslr_memory_enabled()) return; @@ -128,11 +141,18 @@ void __init kernel_randomize_memory(void) vaddr += entropy; *kaslr_regions[i].base = vaddr; + /* Calculate the end of the region */ + vaddr += get_padding(&kaslr_regions[i]); /* - * Jump the region and add a minimum padding based on - * randomization alignment. + * KASLR trims the maximum possible size of the + * direct-map. Update the physmem_end boundary. + * No rounding required as the region starts + * PUD aligned and size is in units of TB. */ - vaddr += get_padding(&kaslr_regions[i]); + if (kaslr_regions[i].end) + *kaslr_regions[i].end = __pa_nodebug(vaddr - 1); + + /* Add a minimum padding based on randomization alignment. */ vaddr = round_up(vaddr + 1, PUD_SIZE); remain_entropy -= entropy; } diff --git a/include/linux/mm.h b/include/linux/mm.h index 1e6dfc386548..399a2fa556d2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -98,6 +98,10 @@ extern const int mmap_rnd_compat_bits_max; extern int mmap_rnd_compat_bits __read_mostly; #endif +#ifndef PHYSMEM_END +# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1) +#endif + #include #include diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2d8e9fb4ce0b..4047f39bf9a2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1568,7 +1568,7 @@ struct range __weak arch_get_mappable_range(void) struct range mhp_get_pluggable_range(bool need_mapping) { - const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1; + const u64 max_phys = PHYSMEM_END; struct range mhp_range; if (need_mapping) { diff --git a/mm/sparse.c b/mm/sparse.c index 27092badd15b..b94f08637044 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -129,7 +129,7 @@ static inline int sparse_early_nid(struct mem_section *section) void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, unsigned long *end_pfn) { - unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT); + unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT; /* * Sanity checks - do not allow an architecture to pass