Message ID | 200810202219.m9KMJiRX000463@d01av04.pok.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
X-Patchwork-Id: 5144 On Mon Oct 20, 2008 near 12:19:21 GMT, Brian King wrote: > > A new field has been added to the VPA as a method for > the client OS to communicate to firmware the number of > page ins it is performing when running collaborative > memory overcommit. The hypervisor will use this information > to better determine if a partition is experiencing memory > pressure and needs more memory allocated to it. > > Signed-off-by: Brian King <brking@linux.vnet.ibm.com> > --- > > arch/powerpc/include/asm/lppaca.h | 3 ++- > arch/powerpc/kernel/paca.c | 1 + > arch/powerpc/mm/fault.c | 8 ++++++-- > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c > --- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure 2008-10-20 17:13:25.000000000 -0500 > +++ linux-2.6-bjking1/arch/powerpc/mm/fault.c 2008-10-20 17:13:25.000000000 -0500 .. > @@ -318,9 +320,11 @@ good_area: > goto do_sigbus; > BUG(); > } > - if (ret & VM_FAULT_MAJOR) > + if (ret & VM_FAULT_MAJOR) { > current->maj_flt++; > - else > + if (firmware_has_feature(FW_FEATURE_CMO)) > + atomic_inc((atomic_t *)(&(get_lppaca()->page_ins))); > + } else > current->min_flt++; > up_read(&mm->mmap_sem); > return 0; (1) why do we need atomic_inc and the hundreds or thousands of cycles to do the atomic inc in a per-cpu area? (2) assuming we make this a normal increment, should we keep the feature test or just do it unconditionally (ie is the additional load and branch worse that just doing the load and store of the counter -- the address was previously reserved, right? (no question if it has to be atomic). <Ramble things one might consider> Ben asked if we need to worry about the hypervisor clearing the count. If they treat it as a only-incrementing then we don't need to worry. And since its only an indicator, then we may not care about a big count by them interrupting between the load and store If we are worried about linux preemption, then we need to disable it to avoid crossing cpu variables or getting to this point multiple times. (I have not looked at the context to see if we are already disabled). </Ramble> milton
Milton Miller wrote: > X-Patchwork-Id: 5144 >> diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c >> --- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure 2008-10-20 17:13:25.000000000 -0500 >> +++ linux-2.6-bjking1/arch/powerpc/mm/fault.c 2008-10-20 17:13:25.000000000 -0500 > .. >> @@ -318,9 +320,11 @@ good_area: >> goto do_sigbus; >> BUG(); >> } >> - if (ret & VM_FAULT_MAJOR) >> + if (ret & VM_FAULT_MAJOR) { >> current->maj_flt++; >> - else >> + if (firmware_has_feature(FW_FEATURE_CMO)) >> + atomic_inc((atomic_t *)(&(get_lppaca()->page_ins))); >> + } else >> current->min_flt++; >> up_read(&mm->mmap_sem); >> return 0; > > (1) why do we need atomic_inc and the hundreds or thousands of cycles to > do the atomic inc in a per-cpu area? We don't. I've now removed it. > (2) assuming we make this a normal increment, should we keep the > feature test or just do it unconditionally (ie is the additional load > and branch worse that just doing the load and store of the counter -- > the address was previously reserved, right? (no question if > it has to be atomic). Simplified patch on the way... > <Ramble things one might consider> > > Ben asked if we need to worry about the hypervisor clearing the > count. If they treat it as a only-incrementing then we don't need > to worry. And since its only an indicator, then we may not care > about a big count by them interrupting between the load and store This is a read only field from the hypervisor's perspective. They shouldn't be clearing it. > If we are worried about linux preemption, then we need to disable > it to avoid crossing cpu variables or getting to this point multiple > times. (I have not looked at the context to see if we are already > disabled). Looks to me like linux preemption is a possibility in this code, so I've added the code to disable preemption around the increment. -Brian
diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c --- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure 2008-10-20 17:13:25.000000000 -0500 +++ linux-2.6-bjking1/arch/powerpc/mm/fault.c 2008-10-20 17:13:25.000000000 -0500 @@ -30,6 +30,8 @@ #include <linux/kprobes.h> #include <linux/kdebug.h> +#include <asm/atomic.h> +#include <asm/firmware.h> #include <asm/page.h> #include <asm/pgtable.h> #include <asm/mmu.h> @@ -318,9 +320,11 @@ good_area: goto do_sigbus; BUG(); } - if (ret & VM_FAULT_MAJOR) + if (ret & VM_FAULT_MAJOR) { current->maj_flt++; - else + if (firmware_has_feature(FW_FEATURE_CMO)) + atomic_inc((atomic_t *)(&(get_lppaca()->page_ins))); + } else current->min_flt++; up_read(&mm->mmap_sem); return 0; diff -puN arch/powerpc/include/asm/lppaca.h~powerpc_vrm_mm_pressure arch/powerpc/include/asm/lppaca.h --- linux-2.6/arch/powerpc/include/asm/lppaca.h~powerpc_vrm_mm_pressure 2008-10-20 17:13:25.000000000 -0500 +++ linux-2.6-bjking1/arch/powerpc/include/asm/lppaca.h 2008-10-20 17:13:25.000000000 -0500 @@ -133,7 +133,8 @@ struct lppaca { //============================================================================= // CACHE_LINE_4-5 0x0180 - 0x027F Contains PMC interrupt data //============================================================================= - u8 pmc_save_area[256]; // PMC interrupt Area x00-xFF + volatile u32 page_ins; // CMO Hint - # page ins by OS x00-x04 + u8 pmc_save_area[252]; // PMC interrupt Area x04-xFF } __attribute__((__aligned__(0x400))); extern struct lppaca lppaca[]; diff -puN arch/powerpc/kernel/paca.c~powerpc_vrm_mm_pressure arch/powerpc/kernel/paca.c --- linux-2.6/arch/powerpc/kernel/paca.c~powerpc_vrm_mm_pressure 2008-10-20 17:13:25.000000000 -0500 +++ linux-2.6-bjking1/arch/powerpc/kernel/paca.c 2008-10-20 17:13:25.000000000 -0500 @@ -37,6 +37,7 @@ struct lppaca lppaca[] = { .end_of_quantum = 0xfffffffffffffffful, .slb_count = 64, .vmxregs_in_use = 0, + .page_ins = 0, }, };
A new field has been added to the VPA as a method for the client OS to communicate to firmware the number of page ins it is performing when running collaborative memory overcommit. The hypervisor will use this information to better determine if a partition is experiencing memory pressure and needs more memory allocated to it. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> --- arch/powerpc/include/asm/lppaca.h | 3 ++- arch/powerpc/kernel/paca.c | 1 + arch/powerpc/mm/fault.c | 8 ++++++-- 3 files changed, 9 insertions(+), 3 deletions(-)