diff mbox series

[v2,5/5] powerpc/book3s64/memhotplug: Enable memmap on memory for radix

Message ID 20230706085041.826340-6-aneesh.kumar@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series Add support for memmap on memory feature on ppc64 | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu fail kernel (g5_defconfig, korg-5.5.0, /linux/arch/powerpc/configs/g5-qemu.config) failed at step Build.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.

Commit Message

Aneesh Kumar K V July 6, 2023, 8:50 a.m. UTC
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks. We also use altmap.reserve
feature to align things correctly at pageblock granularity. We can end up
loosing some pages in memory with this. For ex: with 256MB memory block
size, we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
28 pages get reserved.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
 .../platforms/pseries/hotplug-memory.c        |  4 ++-
 3 files changed, 32 insertions(+), 1 deletion(-)

Comments

David Hildenbrand July 6, 2023, 9:07 a.m. UTC | #1
On 06.07.23 10:50, Aneesh Kumar K.V wrote:
> Radix vmemmap mapping can map things correctly at the PMD level or PTE
> level based on different device boundary checks. We also use altmap.reserve
> feature to align things correctly at pageblock granularity. We can end up
> loosing some pages in memory with this. For ex: with 256MB memory block
> size, we require 4 pages to map vmemmap pages, In order to align things
> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
> 28 pages get reserved.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>   3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 116d6add0bb0..f890907e5bbf 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -157,6 +157,7 @@ config PPC
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   	select ARCH_KEEP_MEMBLOCK
> +	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
>   	select ARCH_MIGHT_HAVE_PC_PARPORT
>   	select ARCH_MIGHT_HAVE_PC_SERIO
>   	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index a62729f70f2a..c0bd60b5fb64 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   
>   	return 1;
>   }
> +
> +/*
> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
> + * some of the restrictions. We don't check for PMD_SIZE because our
> + * vmemmap allocation code can fallback correctly. The pageblock

x86 also has the fallback IIRC; the concern is rather that you end up 
with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped 
vmemmap.

> + * alignment requirement is met using altmap->reserve blocks.
> + */
> +bool mhp_supports_memmap_on_memory(unsigned long size)
> +{
> +	if (!radix_enabled())
> +		return false;
> +	/*
> +	 * The pageblock alignment requirement is met by using
> +	 * reserve blocks in altmap.
> +	 */
> +	return size == memory_block_size_bytes();
> +}

I really dislike such arch overrides.

I think the flow should be something like that, having a generic:

bool mhp_supports_memmap_on_memory(unsigned long size)
{
	...
	return arch_mhp_supports_memmap_on_memory(size)) &&
	       size == memory_block_size_bytes() &&
	       ...
}

where we'll also keep the pageblock check here.

And a ppc specific

bool arch_mhp_supports_memmap_on_memory(unsigned long size)
{
	/*
          * Don't check for the vmemmap covering PMD_SIZE, we accept that
          * we might get a PTE-mapped vmemmap.
          */
	return radix_enabled();
}

We can then move the PMD_SIZE check into arch specific code (x86-aarch64).
Aneesh Kumar K V July 6, 2023, 9:27 a.m. UTC | #2
On 7/6/23 2:37 PM, David Hildenbrand wrote:
> On 06.07.23 10:50, Aneesh Kumar K.V wrote:
>> Radix vmemmap mapping can map things correctly at the PMD level or PTE
>> level based on different device boundary checks. We also use altmap.reserve
>> feature to align things correctly at pageblock granularity. We can end up
>> loosing some pages in memory with this. For ex: with 256MB memory block
>> size, we require 4 pages to map vmemmap pages, In order to align things
>> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages
>> 28 pages get reserved.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/Kconfig                          |  1 +
>>   arch/powerpc/mm/book3s64/radix_pgtable.c      | 28 +++++++++++++++++++
>>   .../platforms/pseries/hotplug-memory.c        |  4 ++-
>>   3 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 116d6add0bb0..f890907e5bbf 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -157,6 +157,7 @@ config PPC
>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select ARCH_KEEP_MEMBLOCK
>> +    select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE    if PPC_RADIX_MMU
>>       select ARCH_MIGHT_HAVE_PC_PARPORT
>>       select ARCH_MIGHT_HAVE_PC_SERIO
>>       select ARCH_OPTIONAL_KERNEL_RWX        if ARCH_HAS_STRICT_KERNEL_RWX
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index a62729f70f2a..c0bd60b5fb64 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>>         return 1;
>>   }
>> +
>> +/*
>> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
>> + * some of the restrictions. We don't check for PMD_SIZE because our
>> + * vmemmap allocation code can fallback correctly. The pageblock
> 
> x86 also has the fallback IIRC; the concern is rather that you end up with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped vmemmap.


The memory block size is dependent on a config option at the hypervisor for power and we can have values ranging from 16MB - 512MB
With these different memory block sizes I was thinking relaxing these checks makes the feature more useful. Also with page size
of 64K using a 2M vmemmap requires larger memory block size. 

> 
>> + * alignment requirement is met using altmap->reserve blocks.
>> + */
>> +bool mhp_supports_memmap_on_memory(unsigned long size)
>> +{
>> +    if (!radix_enabled())
>> +        return false;
>> +    /*
>> +     * The pageblock alignment requirement is met by using
>> +     * reserve blocks in altmap.
>> +     */
>> +    return size == memory_block_size_bytes();
>> +}
> 
> I really dislike such arch overrides.
> 
> I think the flow should be something like that, having a generic:
> 
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
>     ...
>     return arch_mhp_supports_memmap_on_memory(size)) &&
>            size == memory_block_size_bytes() &&
>            ...
> }
> 

Sure we can do that. But for ppc64 we also want to skip the PMD_SIZE and pageblock
alignment restrictions. 

> where we'll also keep the pageblock check here.

For ppc64, I converted that pageblock rule as a reserve blocks allocation with altmap.
I am not sure whether we want to do that outside ppc64? 

> 
> And a ppc specific
> 
> bool arch_mhp_supports_memmap_on_memory(unsigned long size)
> {
>     /*
>          * Don't check for the vmemmap covering PMD_SIZE, we accept that
>          * we might get a PTE-mapped vmemmap.
>          */
>     return radix_enabled();
> }
> 
> We can then move the PMD_SIZE check into arch specific code (x86-aarch64).
> 

sure. will rework the changes accordingly. 

-aneesh
diff mbox series

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 116d6add0bb0..f890907e5bbf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -157,6 +157,7 @@  config PPC
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
+	select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE	if PPC_RADIX_MMU
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_OPTIONAL_KERNEL_RWX		if ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index a62729f70f2a..c0bd60b5fb64 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1678,3 +1678,31 @@  int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	return 1;
 }
+
+/*
+ * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details
+ * some of the restrictions. We don't check for PMD_SIZE because our
+ * vmemmap allocation code can fallback correctly. The pageblock
+ * alignment requirement is met using altmap->reserve blocks.
+ */
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	if (!radix_enabled())
+		return false;
+	/*
+	 * The pageblock alignment requirement is met by using
+	 * reserve blocks in altmap.
+	 */
+	return size == memory_block_size_bytes();
+}
+
+unsigned long memory_block_align_base(struct resource *res)
+{
+	unsigned long base_pfn = PHYS_PFN(res->start);
+	unsigned long align, size = resource_size(res);
+	unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT;
+	unsigned long vmemmap_size = (nr_vmemmap_pages * sizeof(struct page)) >> PAGE_SHIFT;
+
+	align = pageblock_align(base_pfn + vmemmap_size) - (base_pfn + vmemmap_size);
+	return align;
+}
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9c62c2c3b3d0..326db26d773e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -617,6 +617,7 @@  static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
+	mhp_t mhp_flags = MHP_NONE;
 	unsigned long block_sz;
 	int nid, rc;
 
@@ -637,7 +638,8 @@  static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		nid = first_online_node;
 
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
+	mhp_flags |= get_memmap_on_memory_flags();
+	rc = __add_memory(nid, lmb->base_addr, block_sz, mhp_flags);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;