Message ID | 20210430143607.135005-4-leobras.c@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/mm/hash: Time improvements for memory hot(un)plug | expand |
Related | show |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (e3a9b9d6a03f5fbf99b540e863b001d46ba1735c) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 153 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote: > During memory hotunplug, after each LMB is removed, the HPT may be > resized-down if it would map a max of 4 times the current amount of memory. > (2 shifts, due to introduced histeresis) > > It usually is not an issue, but it can take a lot of time if HPT > resizing-down fails. This happens because resize-down failures > usually repeat at each LMB removal, until there are no more bolted entries > conflict, which can take a while to happen. > > This can be solved by doing a single HPT resize at the end of memory > hotunplug, after all requested entries are removed. > > To make this happen, it's necessary to temporarily disable all HPT > resize-downs before hotunplug, re-enable them after hotunplug ends, > and then resize-down HPT to the current memory size. > > As an example, hotunplugging 256GB from a 385GB guest took 621s without > this patch, and 100s after applied. > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Hrm. This looks correct, but it seems overly complicated. AFAICT, the resize calls that this adds should in practice be the *only* times we call resize, all the calls from the lower level code should be suppressed. In which case can't we just remove those calls entirely, and not deal with the clunky locking and exclusion here. That should also remove the need for the 'shrinking' parameter in 1/3. > --- > arch/powerpc/include/asm/book3s/64/hash.h | 2 + > arch/powerpc/mm/book3s64/hash_utils.c | 45 +++++++++++++++++-- > .../platforms/pseries/hotplug-memory.c | 26 +++++++++++ > 3 files changed, 70 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h > index fad4af8b8543..6cd66e7e98c9 100644 > --- a/arch/powerpc/include/asm/book3s/64/hash.h > +++ b/arch/powerpc/include/asm/book3s/64/hash.h > @@ -256,6 +256,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end, > int hash__remove_section_mapping(unsigned long start, unsigned long end); > > void hash_batch_expand_prepare(unsigned long newsize); > +void hash_batch_shrink_begin(void); > +void hash_batch_shrink_end(void); > > #endif /* !__ASSEMBLY__ */ > #endif /* __KERNEL__ */ > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 3fa395b3fe57..73ecd0f61acd 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -795,6 +795,9 @@ static unsigned long __init htab_get_table_size(void) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > + > +static DEFINE_MUTEX(hpt_resize_down_lock); > + > static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) > { > unsigned target_hpt_shift; > @@ -805,7 +808,7 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) > target_hpt_shift = htab_shift_for_mem_size(new_mem_size); > > if (shrinking) { > - > + int ret; > /* > * To avoid lots of HPT resizes if memory size is fluctuating > * across a boundary, we deliberately have some hysterisis > @@ -818,10 +821,20 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) > if (target_hpt_shift >= ppc64_pft_size - 1) > return 0; > > - } else if (target_hpt_shift <= ppc64_pft_size) { > - return 0; > + /* When batch removing entries, only resizes HPT at the end. */ > + > + if (!mutex_trylock(&hpt_resize_down_lock)) > + return 0; > + > + ret = mmu_hash_ops.resize_hpt(target_hpt_shift); > + > + mutex_unlock(&hpt_resize_down_lock); > + return ret; > } > > + if (target_hpt_shift <= ppc64_pft_size) > + return 0; > + > return mmu_hash_ops.resize_hpt(target_hpt_shift); > } > > @@ -879,6 +892,32 @@ void hash_batch_expand_prepare(unsigned long newsize) > break; > } > } > + > +void hash_batch_shrink_begin(void) > +{ > + /* Disable HPT resize-down during hot-unplug */ > + mutex_lock(&hpt_resize_down_lock); > +} > + > +void hash_batch_shrink_end(void) > +{ > + const u64 starting_size = ppc64_pft_size; > + unsigned long newsize; > + > + newsize = memblock_phys_mem_size(); > + /* Resize to smallest SHIFT possible */ > + while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) { > + newsize *= 2; > + pr_warn("Hash collision while resizing HPT\n"); > + > + /* Do not try to resize to the starting size, or bigger value */ > + if (htab_shift_for_mem_size(newsize) >= starting_size) > + break; > + } > + > + /* Re-enables HPT resize-down after hot-unplug */ > + mutex_unlock(&hpt_resize_down_lock); > +} > #endif /* CONFIG_MEMORY_HOTPLUG */ > > static void __init hash_init_partition_table(phys_addr_t hash_table, > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c > index 48b2cfe4ce69..44bc50d72353 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c > @@ -426,6 +426,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove) > return -EINVAL; > } > > + if (!radix_enabled()) > + hash_batch_shrink_begin(); > + > for_each_drmem_lmb(lmb) { > rc = dlpar_remove_lmb(lmb); > if (rc) > @@ -471,6 +474,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove) > rc = 0; > } > > + if (!radix_enabled()) > + hash_batch_shrink_end(); > + > return rc; > } > > @@ -533,6 +539,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) > if (lmbs_available < lmbs_to_remove) > return -EINVAL; > > + if (!radix_enabled()) > + hash_batch_shrink_begin(); > + > for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) { > if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) > continue; > @@ -573,6 +582,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) > } > } > > + if (!radix_enabled()) > + hash_batch_shrink_end(); > + > return rc; > } > > @@ -703,6 +715,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add) > if (lmbs_added != lmbs_to_add) { > pr_err("Memory hot-add failed, removing any added LMBs\n"); > > + if (!radix_enabled()) > + hash_batch_shrink_begin(); > + > for_each_drmem_lmb(lmb) { > if (!drmem_lmb_reserved(lmb)) > continue; > @@ -716,6 +731,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add) > > drmem_remove_lmb_reservation(lmb); > } > + > + if (!radix_enabled()) > + hash_batch_shrink_end(); > + > rc = -EINVAL; > } else { > for_each_drmem_lmb(lmb) { > @@ -817,6 +836,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index) > if (rc) { > pr_err("Memory indexed-count-add failed, removing any added LMBs\n"); > > + if (!radix_enabled()) > + hash_batch_shrink_begin(); > + > for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) { > if (!drmem_lmb_reserved(lmb)) > continue; > @@ -830,6 +852,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index) > > drmem_remove_lmb_reservation(lmb); > } > + > + if (!radix_enabled()) > + hash_batch_shrink_end(); > + > rc = -EINVAL; > } else { > for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote: > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote: > > During memory hotunplug, after each LMB is removed, the HPT may be > > resized-down if it would map a max of 4 times the current amount of > > memory. > > (2 shifts, due to introduced histeresis) > > > > It usually is not an issue, but it can take a lot of time if HPT > > resizing-down fails. This happens because resize-down failures > > usually repeat at each LMB removal, until there are no more bolted > > entries > > conflict, which can take a while to happen. > > > > This can be solved by doing a single HPT resize at the end of > > memory > > hotunplug, after all requested entries are removed. > > > > To make this happen, it's necessary to temporarily disable all HPT > > resize-downs before hotunplug, re-enable them after hotunplug ends, > > and then resize-down HPT to the current memory size. > > > > As an example, hotunplugging 256GB from a 385GB guest took 621s > > without > > this patch, and 100s after applied. > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > Hrm. This looks correct, but it seems overly complicated. > > AFAICT, the resize calls that this adds should in practice be the > *only* times we call resize, all the calls from the lower level code > should be suppressed. That's correct. > In which case can't we just remove those calls > entirely, and not deal with the clunky locking and exclusion here. > That should also remove the need for the 'shrinking' parameter in > 1/3. If I get your suggestion correctly, you suggest something like: 1 - Never calling resize_hpt_for_hotplug() in hash__remove_section_mapping(), thus not needing the srinking parameter. 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in fact call another function to do the batch resize_hpt_for_hotplug() for them If so, that assumes that no other function that currently calls resize_hpt_for_hotplug() under another path, or if they do, it does not need to actually resize the HPT. Is the above correct? There are some examples of functions that currently call resize_hpt_for_hotplug() by another path: add_memory_driver_managed virtio_mem_add_memory dev_dax_kmem_probe reserve_additional_memory balloon_process add_ballooned_pages __add_memory probe_store __remove_memory pseries_remove_memblock remove_memory dev_dax_kmem_remove virtio_mem_remove_memory memunmap_pages pci_p2pdma_add_resource virtio_fs_setup_dax Best regards, Leonardo Bras
On Wed, Jun 09, 2021 at 02:30:36AM -0300, Leonardo Brás wrote: > On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote: > > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote: > > > During memory hotunplug, after each LMB is removed, the HPT may be > > > resized-down if it would map a max of 4 times the current amount of > > > memory. > > > (2 shifts, due to introduced histeresis) > > > > > > It usually is not an issue, but it can take a lot of time if HPT > > > resizing-down fails. This happens because resize-down failures > > > usually repeat at each LMB removal, until there are no more bolted > > > entries > > > conflict, which can take a while to happen. > > > > > > This can be solved by doing a single HPT resize at the end of > > > memory > > > hotunplug, after all requested entries are removed. > > > > > > To make this happen, it's necessary to temporarily disable all HPT > > > resize-downs before hotunplug, re-enable them after hotunplug ends, > > > and then resize-down HPT to the current memory size. > > > > > > As an example, hotunplugging 256GB from a 385GB guest took 621s > > > without > > > this patch, and 100s after applied. > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > > > Hrm. This looks correct, but it seems overly complicated. > > > > AFAICT, the resize calls that this adds should in practice be the > > *only* times we call resize, all the calls from the lower level code > > should be suppressed. > > That's correct. > > > In which case can't we just remove those calls > > entirely, and not deal with the clunky locking and exclusion here. > > That should also remove the need for the 'shrinking' parameter in > > 1/3. > > > If I get your suggestion correctly, you suggest something like: > 1 - Never calling resize_hpt_for_hotplug() in > hash__remove_section_mapping(), thus not needing the srinking > parameter. > 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in > fact call another function to do the batch resize_hpt_for_hotplug() for > them Basically, yes. > If so, that assumes that no other function that currently calls > resize_hpt_for_hotplug() under another path, or if they do, it does not > need to actually resize the HPT. > > Is the above correct? > > There are some examples of functions that currently call > resize_hpt_for_hotplug() by another path: > > add_memory_driver_managed > virtio_mem_add_memory > dev_dax_kmem_probe Oh... virtio-mem. I didn't think of that. > reserve_additional_memory > balloon_process > add_ballooned_pages AFAICT this comes from drivers/xen, and Xen has never been a thing on POWER. > __add_memory > probe_store So this is a sysfs triggered memory add. If the user is doing this manually, then I think it's reasonable for them to manually manage the HPT size as well, which they can do through debugfs. I think it might also be used my drmgr under pHyp, but pHyp doesn't support HPT resizing. > __remove_memory > pseries_remove_memblock Huh, this one comes through OF_RECONFIG_DETACH_NODE. I don't really know when those happen, but I strongly suspect it's only under pHyp again. > remove_memory > dev_dax_kmem_remove > virtio_mem_remove_memory virtio-mem again. > memunmap_pages > pci_p2pdma_add_resource > virtio_fs_setup_dax And virtio-fs in dax mode. Didn't think of that either. Ugh, yeah, I'm used to the world where the platform provides the only way of hotplugging memory, but virtio-mem does indeed provide another one, and we could indeed need to manage the HPT size based on that. Drat, so moving all the HPT resizing handling up into pseries/hotplug-memory.c won't work. I still think we can simplify the communication between the stuff in the pseries hotplug code and the actual hash resizing. In your draft there are kind of 3 ways the information is conveyed: the mutex suppresses HPT shrinks, pre-growing past what we need prevents HPT grows, and the 'shrinking' flag handles some edge cases. I suggest instead a single flag that will suppress all the current resizes. Not sure it technically has to be an atomic mutex, but that's probably the obvious safe choice. Then have a "resize up to target" and "resize down to target" that ignore that suppression and are no-ops if the target is in the other direction. Then you should be able to make the path for pseries hotplugs be: suppress other resizes resize up to target do the actual adds or removes resize down to target unsuppress other resizes
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h index fad4af8b8543..6cd66e7e98c9 100644 --- a/arch/powerpc/include/asm/book3s/64/hash.h +++ b/arch/powerpc/include/asm/book3s/64/hash.h @@ -256,6 +256,8 @@ int hash__create_section_mapping(unsigned long start, unsigned long end, int hash__remove_section_mapping(unsigned long start, unsigned long end); void hash_batch_expand_prepare(unsigned long newsize); +void hash_batch_shrink_begin(void); +void hash_batch_shrink_end(void); #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 3fa395b3fe57..73ecd0f61acd 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -795,6 +795,9 @@ static unsigned long __init htab_get_table_size(void) } #ifdef CONFIG_MEMORY_HOTPLUG + +static DEFINE_MUTEX(hpt_resize_down_lock); + static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) { unsigned target_hpt_shift; @@ -805,7 +808,7 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) target_hpt_shift = htab_shift_for_mem_size(new_mem_size); if (shrinking) { - + int ret; /* * To avoid lots of HPT resizes if memory size is fluctuating * across a boundary, we deliberately have some hysterisis @@ -818,10 +821,20 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) if (target_hpt_shift >= ppc64_pft_size - 1) return 0; - } else if (target_hpt_shift <= ppc64_pft_size) { - return 0; + /* When batch removing entries, only resizes HPT at the end. */ + + if (!mutex_trylock(&hpt_resize_down_lock)) + return 0; + + ret = mmu_hash_ops.resize_hpt(target_hpt_shift); + + mutex_unlock(&hpt_resize_down_lock); + return ret; } + if (target_hpt_shift <= ppc64_pft_size) + return 0; + return mmu_hash_ops.resize_hpt(target_hpt_shift); } @@ -879,6 +892,32 @@ void hash_batch_expand_prepare(unsigned long newsize) break; } } + +void hash_batch_shrink_begin(void) +{ + /* Disable HPT resize-down during hot-unplug */ + mutex_lock(&hpt_resize_down_lock); +} + +void hash_batch_shrink_end(void) +{ + const u64 starting_size = ppc64_pft_size; + unsigned long newsize; + + newsize = memblock_phys_mem_size(); + /* Resize to smallest SHIFT possible */ + while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) { + newsize *= 2; + pr_warn("Hash collision while resizing HPT\n"); + + /* Do not try to resize to the starting size, or bigger value */ + if (htab_shift_for_mem_size(newsize) >= starting_size) + break; + } + + /* Re-enables HPT resize-down after hot-unplug */ + mutex_unlock(&hpt_resize_down_lock); +} #endif /* CONFIG_MEMORY_HOTPLUG */ static void __init hash_init_partition_table(phys_addr_t hash_table, diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 48b2cfe4ce69..44bc50d72353 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -426,6 +426,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove) return -EINVAL; } + if (!radix_enabled()) + hash_batch_shrink_begin(); + for_each_drmem_lmb(lmb) { rc = dlpar_remove_lmb(lmb); if (rc) @@ -471,6 +474,9 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove) rc = 0; } + if (!radix_enabled()) + hash_batch_shrink_end(); + return rc; } @@ -533,6 +539,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) if (lmbs_available < lmbs_to_remove) return -EINVAL; + if (!radix_enabled()) + hash_batch_shrink_begin(); + for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) { if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) continue; @@ -573,6 +582,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) } } + if (!radix_enabled()) + hash_batch_shrink_end(); + return rc; } @@ -703,6 +715,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add) if (lmbs_added != lmbs_to_add) { pr_err("Memory hot-add failed, removing any added LMBs\n"); + if (!radix_enabled()) + hash_batch_shrink_begin(); + for_each_drmem_lmb(lmb) { if (!drmem_lmb_reserved(lmb)) continue; @@ -716,6 +731,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add) drmem_remove_lmb_reservation(lmb); } + + if (!radix_enabled()) + hash_batch_shrink_end(); + rc = -EINVAL; } else { for_each_drmem_lmb(lmb) { @@ -817,6 +836,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index) if (rc) { pr_err("Memory indexed-count-add failed, removing any added LMBs\n"); + if (!radix_enabled()) + hash_batch_shrink_begin(); + for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) { if (!drmem_lmb_reserved(lmb)) continue; @@ -830,6 +852,10 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index) drmem_remove_lmb_reservation(lmb); } + + if (!radix_enabled()) + hash_batch_shrink_end(); + rc = -EINVAL; } else { for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
During memory hotunplug, after each LMB is removed, the HPT may be resized-down if it would map a max of 4 times the current amount of memory. (2 shifts, due to introduced histeresis) It usually is not an issue, but it can take a lot of time if HPT resizing-down fails. This happens because resize-down failures usually repeat at each LMB removal, until there are no more bolted entries conflict, which can take a while to happen. This can be solved by doing a single HPT resize at the end of memory hotunplug, after all requested entries are removed. To make this happen, it's necessary to temporarily disable all HPT resize-downs before hotunplug, re-enable them after hotunplug ends, and then resize-down HPT to the current memory size. As an example, hotunplugging 256GB from a 385GB guest took 621s without this patch, and 100s after applied. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> --- arch/powerpc/include/asm/book3s/64/hash.h | 2 + arch/powerpc/mm/book3s64/hash_utils.c | 45 +++++++++++++++++-- .../platforms/pseries/hotplug-memory.c | 26 +++++++++++ 3 files changed, 70 insertions(+), 3 deletions(-)