Message ID | 20240111-h_page_set_unused-for-lpm-v1-1-cd56184ad608@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] powerpc/pseries: exploit H_PAGE_SET_UNUSED for partition migration | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
> > Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has > been tied to the cooperative memory overcommit (CMO) platform feature, > the flag also is treated by the PowerVM hypervisor as a hint that the > page contents need not be copied to the destination during a live > partition migration. > > Use the "ibm,migratable-partition" root node property to determine > whether this partition/guest can be migrated. Mark freed pages unused > if so (or if CMO is in use, as before). > > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> > --- > Several things yet to improve here: > > * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled > from CONFIG_PPC_SMLPAR. > > * powerpc's arch_free_page() could be made to use a static key if > justified. > > * I have not yet measured the overhead this introduces, nor have I > measured the benefit to a live migration. > > To date, I have smoke tested it by doing a live migration and > performing a build on a kernel with the change, to ensure it doesn't > introduce obvious memory corruption or anything. An update on this. I've been working on quantifying the benefit, and I've now seen memory corruption after partition migrations, with several programs getting mysterious SIGSEGV/SIGILL: mobility: calling ibm,suspend-me on CPU 7 mobility: CPU 7 waking all threads mobility: waiting for memory transfer to complete... touch[10988]: segfault (11) at 8 nip 7fff8fc2f9fc lr 7fff8fc26684 code 1 in ld-2.28.so[7fff8fc20000+30000] touch[10987]: segfault (11) at 8 nip 7fff86f5f9fc lr 7fff86f56684 code 1 in ld-2.28.so[7fff86f50000+30000] touch[10988]: code: 3d22ffff 81297470 71290020 4082254c e93d00f0 2fa90000 f93f00b0 409e2458 touch[10988]: code: e93d00f8 e95d0068 eb1d0000 2fa90000 <ea2a0008> 419e00f8 e93d0158 e91d0058 touch[10987]: code: 3d22ffff 81297470 71290020 4082254c e93d00f0 2fa90000 f93f00b0 409e2458 touch[10987]: code: e93d00f8 e95d0068 eb1d0000 2fa90000 <ea2a0008> 419e00f8 e93d0158 e91d0058 Maybe I had some debug options enabled that masked this before, or just got lucky. In any case, it seems obvious now that for this to be safe, powerpc/pseries needs to implement arch_alloc_page() to undo setting the "unused" flag. > > This will be a possibly significant behavior change in that we will be > flagging pages unused where we typically did not before. Until now, > having CMO enabled was the only way to do this, and I don't think that > feature is used all that much? > > Posting this as RFC to see if there are any major concerns. > --- > arch/powerpc/platforms/pseries/lpar.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c > index 4561667832ed..b264371d8e12 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -16,6 +16,7 @@ > #include <linux/export.h> > #include <linux/jump_label.h> > #include <linux/delay.h> > +#include <linux/of.h> > #include <linux/stop_machine.h> > #include <linux/spinlock.h> > #include <linux/cpuhotplug.h> > @@ -1772,17 +1773,25 @@ static void pSeries_set_page_state(struct page *page, int order, > } > } > > +static bool migratable_partition; > + > void arch_free_page(struct page *page, int order) > { > - if (radix_enabled()) > - return; > - if (!cmo_free_hint_flag || !firmware_has_feature(FW_FEATURE_CMO)) > - return; > - > - pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED); > + if (migratable_partition || > + (firmware_has_feature(FW_FEATURE_CMO) && cmo_free_hint_flag)) > + pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED); > } > EXPORT_SYMBOL(arch_free_page); > > +static int __init check_migratable_partition(void) > +{ > + struct device_node *root = of_find_node_by_path("/"); > + migratable_partition = !!of_find_property(root, "ibm,migratable-partition", NULL); > + of_node_put(root); > + return 0; > +} > +machine_device_initcall(pseries, check_migratable_partition); > + > #endif /* CONFIG_PPC_SMLPAR */ > #endif /* CONFIG_PPC_BOOK3S_64 */ > >
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes: > From: Nathan Lynch <nathanl@linux.ibm.com> > > Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has > been tied to the cooperative memory overcommit (CMO) platform feature, > the flag also is treated by the PowerVM hypervisor as a hint that the > page contents need not be copied to the destination during a live > partition migration. > > Use the "ibm,migratable-partition" root node property to determine > whether this partition/guest can be migrated. Mark freed pages unused > if so (or if CMO is in use, as before). > > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> > --- > Several things yet to improve here: > > * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled > from CONFIG_PPC_SMLPAR. > > * powerpc's arch_free_page() could be made to use a static key if > justified. > > * I have not yet measured the overhead this introduces, nor have I > measured the benefit to a live migration. > > To date, I have smoke tested it by doing a live migration and > performing a build on a kernel with the change, to ensure it doesn't > introduce obvious memory corruption or anything. It hasn't blown up > yet :-) > > This will be a possibly significant behavior change in that we will be > flagging pages unused where we typically did not before. Until now, > having CMO enabled was the only way to do this, and I don't think that > feature is used all that much? Yeah AFAIK it has to be explicitly configured and enabled via the HMC, so doesn't get much testing or usage. > Posting this as RFC to see if there are any major concerns. My worry is that this will add overhead for everyone in normal usage, an hcall per freed set of pages, whereas the benefit is only seen when a migration happens. But that does depend on how often arch_free_page() gets called in normal usage, which I don't know offhand. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> > writes: >> From: Nathan Lynch <nathanl@linux.ibm.com> >> >> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has >> been tied to the cooperative memory overcommit (CMO) platform feature, >> the flag also is treated by the PowerVM hypervisor as a hint that the >> page contents need not be copied to the destination during a live >> partition migration. >> >> Use the "ibm,migratable-partition" root node property to determine >> whether this partition/guest can be migrated. Mark freed pages unused >> if so (or if CMO is in use, as before). >> >> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> >> --- >> Several things yet to improve here: >> >> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled >> from CONFIG_PPC_SMLPAR. >> >> * powerpc's arch_free_page() could be made to use a static key if >> justified. >> >> * I have not yet measured the overhead this introduces, nor have I >> measured the benefit to a live migration. >> >> To date, I have smoke tested it by doing a live migration and >> performing a build on a kernel with the change, to ensure it doesn't >> introduce obvious memory corruption or anything. It hasn't blown up >> yet :-) >> >> This will be a possibly significant behavior change in that we will be >> flagging pages unused where we typically did not before. Until now, >> having CMO enabled was the only way to do this, and I don't think that >> feature is used all that much? > > Yeah AFAIK it has to be explicitly configured and enabled via the HMC, > so doesn't get much testing or usage. > >> Posting this as RFC to see if there are any major concerns. > > My worry is that this will add overhead for everyone in normal usage, an > hcall per freed set of pages, whereas the benefit is only seen when a > migration happens. > > But that does depend on how often arch_free_page() gets called in normal > usage, which I don't know offhand. Yes, and as I said in my followup yesterday: >> for this to be safe, powerpc/pseries needs to implement >> arch_alloc_page() to undo setting the "unused" flag. So, perhaps more significantly, we'd also incur an hcall per arch_alloc_page() with the most straightforward implementation that doesn't eat data (unlike this version!). Nevertheless I'll plan on doing that for the next iteration to see if I can measure the overhead and benefit, with the expectation that we'll ultimately need a more sophisticated design.
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 4561667832ed..b264371d8e12 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -16,6 +16,7 @@ #include <linux/export.h> #include <linux/jump_label.h> #include <linux/delay.h> +#include <linux/of.h> #include <linux/stop_machine.h> #include <linux/spinlock.h> #include <linux/cpuhotplug.h> @@ -1772,17 +1773,25 @@ static void pSeries_set_page_state(struct page *page, int order, } } +static bool migratable_partition; + void arch_free_page(struct page *page, int order) { - if (radix_enabled()) - return; - if (!cmo_free_hint_flag || !firmware_has_feature(FW_FEATURE_CMO)) - return; - - pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED); + if (migratable_partition || + (firmware_has_feature(FW_FEATURE_CMO) && cmo_free_hint_flag)) + pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED); } EXPORT_SYMBOL(arch_free_page); +static int __init check_migratable_partition(void) +{ + struct device_node *root = of_find_node_by_path("/"); + migratable_partition = !!of_find_property(root, "ibm,migratable-partition", NULL); + of_node_put(root); + return 0; +} +machine_device_initcall(pseries, check_migratable_partition); + #endif /* CONFIG_PPC_SMLPAR */ #endif /* CONFIG_PPC_BOOK3S_64 */