Message ID | 200810272143.02144.markn@au1.ibm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 4ec577a28980a0790df3c3dfe9c81f6e2222acfb |
Delegated to: | Paul Mackerras |
Headers | show |
On Oct 27, 2008, at 5:43 AM, Mark Nelson wrote: > Add a new CPU feature bit, CPU_FTR_UNALIGNED_LD_STD, to be added > to the 64bit powerpc chips that can do unaligned load double and > store double (and not take a performance hit from it). > > This is added to Power6 and Cell and will be used in an upcoming > patch to do the alignment in memcpy() only on CPUs that require > it. > > Signed-off-by: Mark Nelson <markn@au1.ibm.com> > --- > arch/powerpc/include/asm/cputable.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) not that I have anything against this patch but is there anything we can do about the CPU_FTR_ usage for one of things like this? - k
Kumar Gala writes: > not that I have anything against this patch but is there anything we > can do about the CPU_FTR_ usage for one of things like this? <blinks> Can you express that a little more coherently for us, please? :) Paul.
On Oct 27, 2008, at 10:16 PM, Paul Mackerras wrote: > Kumar Gala writes: > >> not that I have anything against this patch but is there anything we >> can do about the CPU_FTR_ usage for one of things like this? > > <blinks> > > Can you express that a little more coherently for us, please? :) I'm asking if we can come up with a better solution than using CPU_FTR_ bits for extremely specific uses like this. - k
Kumar Gala writes: > I'm asking if we can come up with a better solution than using > CPU_FTR_ bits for extremely specific uses like this. I haven't been able to think of an alternative that doesn't amount to an ad-hoc reimplementation of the CPU feature mechanism. Do you have a suggestion? We have code here that needs to know about the characteristics of the processor we're running, and is somewhat performance-critical (and __copy_tofrom_user, which Mark is working on next, will also use these bits and is definitely performance-critical). So it is natural to use the CPU feature mechanism, which is there precisely to provide this kind of information. I assume you're worried about running out of feature bits. The bit Mark is adding is in the 64-bit section of the kernel feature mask, so it doesn't affect 32-bit processors at all. Anyway, we still have 8 free bits there after Mark's patch. It's the user feature mask where we're getting tight, since it's limited to 32 bits and we only have 4 bits free. Paul.
Index: upstream/arch/powerpc/include/asm/cputable.h =================================================================== --- upstream.orig/arch/powerpc/include/asm/cputable.h +++ upstream/arch/powerpc/include/asm/cputable.h @@ -194,6 +194,7 @@ extern const char *powerpc_base_platform #define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000) #define CPU_FTR_SAO LONG_ASM_CONST(0x0020000000000000) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040000000000000) +#define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080000000000000) #ifndef __ASSEMBLY__ @@ -404,7 +405,7 @@ extern const char *powerpc_base_platform CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR) + CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD) #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -415,7 +416,8 @@ extern const char *powerpc_base_platform CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_PAUSE_ZERO | CPU_FTR_CI_LARGE_PAGE | \ - CPU_FTR_CELL_TB_BUG | CPU_FTR_CP_USE_DCBTZ) + CPU_FTR_CELL_TB_BUG | CPU_FTR_CP_USE_DCBTZ | \ + CPU_FTR_UNALIGNED_LD_STD) #define CPU_FTRS_PA6T (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_CI_LARGE_PAGE | \
Add a new CPU feature bit, CPU_FTR_UNALIGNED_LD_STD, to be added to the 64bit powerpc chips that can do unaligned load double and store double (and not take a performance hit from it). This is added to Power6 and Cell and will be used in an upcoming patch to do the alignment in memcpy() only on CPUs that require it. Signed-off-by: Mark Nelson <markn@au1.ibm.com> --- arch/powerpc/include/asm/cputable.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)