Message ID | 20210430143607.135005-2-leobras.c@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/mm/hash: Time improvements for memory hot(un)plug | expand |
Related | show |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (e3a9b9d6a03f5fbf99b540e863b001d46ba1735c) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 61 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > Because hypervisors may need to create HPTs without knowing the guest > page size, the smallest used page-size (4k) may be chosen, resulting in > a HPT that is possibly bigger than needed. > > On a guest with bigger page-sizes, the amount of entries for HTP may be > too high, causing the guest to ask for a HPT resize-down on the first > hotplug. > > This becomes a problem when HPT resize-down fails, and causes the > HPT resize to be performed on every LMB added, until HPT size is > compatible to guest memory size, causing a major slowdown. > > So, avoiding HPT resizing-down on hot-add significantly improves memory > hotplug times. > > As an example, hotplugging 256GB on a 129GB guest took 710s without this > patch, and 21s after applied. > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Sorry it's taken me so long to look at these I don't love the extra statefulness that the 'shrinking' parameter adds, but I can't see an elegant way to avoid it, so: Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > --- > arch/powerpc/mm/book3s64/hash_utils.c | 36 ++++++++++++++++----------- > 1 file changed, 21 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 581b20a2feaf..608e4ed397a9 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -795,7 +795,7 @@ static unsigned long __init htab_get_table_size(void) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > -static int resize_hpt_for_hotplug(unsigned long new_mem_size) > +static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) > { > unsigned target_hpt_shift; > > @@ -804,19 +804,25 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size) > > target_hpt_shift = htab_shift_for_mem_size(new_mem_size); > > - /* > - * To avoid lots of HPT resizes if memory size is fluctuating > - * across a boundary, we deliberately have some hysterisis > - * here: we immediately increase the HPT size if the target > - * shift exceeds the current shift, but we won't attempt to > - * reduce unless the target shift is at least 2 below the > - * current shift > - */ > - if (target_hpt_shift > ppc64_pft_size || > - target_hpt_shift < ppc64_pft_size - 1) > - return mmu_hash_ops.resize_hpt(target_hpt_shift); > + if (shrinking) { > > - return 0; > + /* > + * To avoid lots of HPT resizes if memory size is fluctuating > + * across a boundary, we deliberately have some hysterisis > + * here: we immediately increase the HPT size if the target > + * shift exceeds the current shift, but we won't attempt to > + * reduce unless the target shift is at least 2 below the > + * current shift > + */ > + > + if (target_hpt_shift >= ppc64_pft_size - 1) > + return 0; > + > + } else if (target_hpt_shift <= ppc64_pft_size) { > + return 0; > + } > + > + return mmu_hash_ops.resize_hpt(target_hpt_shift); > } > > int hash__create_section_mapping(unsigned long start, unsigned long end, > @@ -829,7 +835,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end, > return -1; > } > > - resize_hpt_for_hotplug(memblock_phys_mem_size()); > + resize_hpt_for_hotplug(memblock_phys_mem_size(), false); > > rc = htab_bolt_mapping(start, end, __pa(start), > pgprot_val(prot), mmu_linear_psize, > @@ -848,7 +854,7 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end) > int rc = htab_remove_mapping(start, end, mmu_linear_psize, > mmu_kernel_ssize); > > - if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC) > + if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC) > pr_warn("Hash collision while resizing HPT\n"); > > return rc;
On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote: > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > > Because hypervisors may need to create HPTs without knowing the > > guest > > page size, the smallest used page-size (4k) may be chosen, > > resulting in > > a HPT that is possibly bigger than needed. > > > > On a guest with bigger page-sizes, the amount of entries for HTP > > may be > > too high, causing the guest to ask for a HPT resize-down on the > > first > > hotplug. > > > > This becomes a problem when HPT resize-down fails, and causes the > > HPT resize to be performed on every LMB added, until HPT size is > > compatible to guest memory size, causing a major slowdown. > > > > So, avoiding HPT resizing-down on hot-add significantly improves > > memory > > hotplug times. > > > > As an example, hotplugging 256GB on a 129GB guest took 710s without > > this > > patch, and 21s after applied. > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > Sorry it's taken me so long to look at these > > I don't love the extra statefulness that the 'shrinking' parameter > adds, but I can't see an elegant way to avoid it, so: > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> np, thanks for reviewing! Best regards, Leonardo Bras
On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote: > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote: > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > > > Because hypervisors may need to create HPTs without knowing the > > > guest > > > page size, the smallest used page-size (4k) may be chosen, > > > resulting in > > > a HPT that is possibly bigger than needed. > > > > > > On a guest with bigger page-sizes, the amount of entries for HTP > > > may be > > > too high, causing the guest to ask for a HPT resize-down on the > > > first > > > hotplug. > > > > > > This becomes a problem when HPT resize-down fails, and causes the > > > HPT resize to be performed on every LMB added, until HPT size is > > > compatible to guest memory size, causing a major slowdown. > > > > > > So, avoiding HPT resizing-down on hot-add significantly improves > > > memory > > > hotplug times. > > > > > > As an example, hotplugging 256GB on a 129GB guest took 710s without > > > this > > > patch, and 21s after applied. > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > > > Sorry it's taken me so long to look at these > > > > I don't love the extra statefulness that the 'shrinking' parameter > > adds, but I can't see an elegant way to avoid it, so: > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > np, thanks for reviewing! Actually... I take that back. With the subsequent patches my discomfort with the complexity of implementing the batching grew. I think I can see a simpler way - although it wasn't as clear as I thought it might be, without some deep history on this feature. What's going on here is pretty hard to follow, because it starts in arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c) where it processes the add/remove requests, then goes into generic code __add_memory() which eventually emerges back in arch specific code (hash__create_section_mapping()). The HPT resizing calls are in the "inner" arch specific section, whereas it's only the outer arch section that has the information to batch properly. The mutex and 'shrinking' parameter in Leonardo's code are all about conveying information from the outer to inner section. Now, I think the reason I had the resize calls in the inner section was to accomodate the notion that a) pHyp might support resizing in future, and it could come in through a different path with its drmgr thingy and/or b) bare metal hash architectures might want to implement hash resizing, and this would make at least part of the path common. Given the decreasing relevance of hash MMUs, I think we can now safely say neither of these is ever going to happen. Therefore, we can simplify things by moving the HPT resize calls into the pseries LMB code, instead of create/remove_section_mapping. Then to do batching without extra complications we just need this logic for all resizes (both add and remove): let new_hpt_order = expected HPT size for new mem size; if (new_hpt_order > current_hpt_order) resize to new_hpt_order add/remove memory if (new_hpt_order < current_hpt_order - 1) resize to new_hpt_order
On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote: > On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote: > > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote: > > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > > > > Because hypervisors may need to create HPTs without knowing the > > > > guest > > > > page size, the smallest used page-size (4k) may be chosen, > > > > resulting in > > > > a HPT that is possibly bigger than needed. > > > > > > > > On a guest with bigger page-sizes, the amount of entries for > > > > HTP > > > > may be > > > > too high, causing the guest to ask for a HPT resize-down on the > > > > first > > > > hotplug. > > > > > > > > This becomes a problem when HPT resize-down fails, and causes > > > > the > > > > HPT resize to be performed on every LMB added, until HPT size > > > > is > > > > compatible to guest memory size, causing a major slowdown. > > > > > > > > So, avoiding HPT resizing-down on hot-add significantly > > > > improves > > > > memory > > > > hotplug times. > > > > > > > > As an example, hotplugging 256GB on a 129GB guest took 710s > > > > without > > > > this > > > > patch, and 21s after applied. > > > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > > > > > Sorry it's taken me so long to look at these > > > > > > I don't love the extra statefulness that the 'shrinking' > > > parameter > > > adds, but I can't see an elegant way to avoid it, so: > > > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > > > np, thanks for reviewing! > > Actually... I take that back. With the subsequent patches my > discomfort with the complexity of implementing the batching grew. > > I think I can see a simpler way - although it wasn't as clear as I > thought it might be, without some deep history on this feature. > > What's going on here is pretty hard to follow, because it starts in > arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c) > where it processes the add/remove requests, then goes into generic > code __add_memory() which eventually emerges back in arch specific > code (hash__create_section_mapping()). > > The HPT resizing calls are in the "inner" arch specific section, > whereas it's only the outer arch section that has the information to > batch properly. The mutex and 'shrinking' parameter in Leonardo's > code are all about conveying information from the outer to inner > section. > > Now, I think the reason I had the resize calls in the inner section > was to accomodate the notion that a) pHyp might support resizing in > future, and it could come in through a different path with its drmgr > thingy and/or b) bare metal hash architectures might want to > implement > hash resizing, and this would make at least part of the path common. > > Given the decreasing relevance of hash MMUs, I think we can now > safely > say neither of these is ever going to happen. > > Therefore, we can simplify things by moving the HPT resize calls into > the pseries LMB code, instead of create/remove_section_mapping. Then > to do batching without extra complications we just need this logic > for > all resizes (both add and remove): > > let new_hpt_order = expected HPT size for new mem size; > > if (new_hpt_order > current_hpt_order) > resize to new_hpt_order > > add/remove memory > > if (new_hpt_order < current_hpt_order - 1) > resize to new_hpt_order > > Ok, that really does seem to simplify a lot the batching. Question: by LMB code, you mean dlpar_memory_{add,remove}_by_* ? (dealing only with dlpar_{add,remove}_lmb() would not be enough to deal with batching) In my 3/3 repĺy I sent you some other examples of functions that currently end up calling resize_hpt_for_hotplug() without comming from hotplug-memory.c. Is that ok that they do not call it anymore? Best regards, Leonardo Bras
On Wed, Jun 09, 2021 at 02:51:49AM -0300, Leonardo Brás wrote: > On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote: > > On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote: > > > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote: > > > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote: > > > > > Because hypervisors may need to create HPTs without knowing the > > > > > guest > > > > > page size, the smallest used page-size (4k) may be chosen, > > > > > resulting in > > > > > a HPT that is possibly bigger than needed. > > > > > > > > > > On a guest with bigger page-sizes, the amount of entries for > > > > > HTP > > > > > may be > > > > > too high, causing the guest to ask for a HPT resize-down on the > > > > > first > > > > > hotplug. > > > > > > > > > > This becomes a problem when HPT resize-down fails, and causes > > > > > the > > > > > HPT resize to be performed on every LMB added, until HPT size > > > > > is > > > > > compatible to guest memory size, causing a major slowdown. > > > > > > > > > > So, avoiding HPT resizing-down on hot-add significantly > > > > > improves > > > > > memory > > > > > hotplug times. > > > > > > > > > > As an example, hotplugging 256GB on a 129GB guest took 710s > > > > > without > > > > > this > > > > > patch, and 21s after applied. > > > > > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com> > > > > > > > > Sorry it's taken me so long to look at these > > > > > > > > I don't love the extra statefulness that the 'shrinking' > > > > parameter > > > > adds, but I can't see an elegant way to avoid it, so: > > > > > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > > > > > np, thanks for reviewing! > > > > Actually... I take that back. With the subsequent patches my > > discomfort with the complexity of implementing the batching grew. > > > > I think I can see a simpler way - although it wasn't as clear as I > > thought it might be, without some deep history on this feature. > > > > What's going on here is pretty hard to follow, because it starts in > > arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c) > > where it processes the add/remove requests, then goes into generic > > code __add_memory() which eventually emerges back in arch specific > > code (hash__create_section_mapping()). > > > > The HPT resizing calls are in the "inner" arch specific section, > > whereas it's only the outer arch section that has the information to > > batch properly. The mutex and 'shrinking' parameter in Leonardo's > > code are all about conveying information from the outer to inner > > section. > > > > Now, I think the reason I had the resize calls in the inner section > > was to accomodate the notion that a) pHyp might support resizing in > > future, and it could come in through a different path with its drmgr > > thingy and/or b) bare metal hash architectures might want to > > implement > > hash resizing, and this would make at least part of the path common. > > > > Given the decreasing relevance of hash MMUs, I think we can now > > safely > > say neither of these is ever going to happen. > > > > Therefore, we can simplify things by moving the HPT resize calls into > > the pseries LMB code, instead of create/remove_section_mapping. Then > > to do batching without extra complications we just need this logic > > for > > all resizes (both add and remove): > > > > let new_hpt_order = expected HPT size for new mem size; > > > > if (new_hpt_order > current_hpt_order) > > resize to new_hpt_order > > > > add/remove memory > > > > if (new_hpt_order < current_hpt_order - 1) > > resize to new_hpt_order > > > > > > > Ok, that really does seem to simplify a lot the batching. > > Question: > by LMB code, you mean dlpar_memory_{add,remove}_by_* ? > (dealing only with dlpar_{add,remove}_lmb() would not be enough to deal > with batching) I was thinking of a two stage process. First moving the resizes to dlpar_{add,remote}_lmb() (not changing behaviour for the pseries dlpar path), then implementing the batching by moving to the {add,remove}_by functions. But.. > In my 3/3 repĺy I sent you some other examples of functions that > currently end up calling resize_hpt_for_hotplug() without comming from > hotplug-memory.c. Is that ok that they do not call it anymore? ..as I replied there, I was wrong about it being safe to move the resizes all to the pseries dlpar code.
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 581b20a2feaf..608e4ed397a9 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -795,7 +795,7 @@ static unsigned long __init htab_get_table_size(void) } #ifdef CONFIG_MEMORY_HOTPLUG -static int resize_hpt_for_hotplug(unsigned long new_mem_size) +static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking) { unsigned target_hpt_shift; @@ -804,19 +804,25 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size) target_hpt_shift = htab_shift_for_mem_size(new_mem_size); - /* - * To avoid lots of HPT resizes if memory size is fluctuating - * across a boundary, we deliberately have some hysterisis - * here: we immediately increase the HPT size if the target - * shift exceeds the current shift, but we won't attempt to - * reduce unless the target shift is at least 2 below the - * current shift - */ - if (target_hpt_shift > ppc64_pft_size || - target_hpt_shift < ppc64_pft_size - 1) - return mmu_hash_ops.resize_hpt(target_hpt_shift); + if (shrinking) { - return 0; + /* + * To avoid lots of HPT resizes if memory size is fluctuating + * across a boundary, we deliberately have some hysterisis + * here: we immediately increase the HPT size if the target + * shift exceeds the current shift, but we won't attempt to + * reduce unless the target shift is at least 2 below the + * current shift + */ + + if (target_hpt_shift >= ppc64_pft_size - 1) + return 0; + + } else if (target_hpt_shift <= ppc64_pft_size) { + return 0; + } + + return mmu_hash_ops.resize_hpt(target_hpt_shift); } int hash__create_section_mapping(unsigned long start, unsigned long end, @@ -829,7 +835,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end, return -1; } - resize_hpt_for_hotplug(memblock_phys_mem_size()); + resize_hpt_for_hotplug(memblock_phys_mem_size(), false); rc = htab_bolt_mapping(start, end, __pa(start), pgprot_val(prot), mmu_linear_psize, @@ -848,7 +854,7 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end) int rc = htab_remove_mapping(start, end, mmu_linear_psize, mmu_kernel_ssize); - if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC) + if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC) pr_warn("Hash collision while resizing HPT\n"); return rc;
Because hypervisors may need to create HPTs without knowing the guest page size, the smallest used page-size (4k) may be chosen, resulting in a HPT that is possibly bigger than needed. On a guest with bigger page-sizes, the amount of entries for HTP may be too high, causing the guest to ask for a HPT resize-down on the first hotplug. This becomes a problem when HPT resize-down fails, and causes the HPT resize to be performed on every LMB added, until HPT size is compatible to guest memory size, causing a major slowdown. So, avoiding HPT resizing-down on hot-add significantly improves memory hotplug times. As an example, hotplugging 256GB on a 129GB guest took 710s without this patch, and 21s after applied. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> --- arch/powerpc/mm/book3s64/hash_utils.c | 36 ++++++++++++++++----------- 1 file changed, 21 insertions(+), 15 deletions(-)