Message ID | 1439466697-18989-3-git-send-email-haokexin@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Scott Wood |
Headers | show |
On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote: > I didn't find anything unusual. But I think we do need to order the > load/store of esel_next when acquire/release tcd lock. For acquire, > add a data dependency to order the loads of lock and esel_next. > For release, even there already have a "isync" here, but it doesn't > guarantee any memory access order. So we still need "lwsync" for > the two stores for lock and esel_next. I was going to say that esel_next is just a hint and it doesn't really matter if we occasionally get the wrong value, unless it happens often enough to cause more performance degradation than the lwsync causes. However, with the A-008139 workaround we do need to read the same value from esel_next both times. It might be less costly to save/restore an additional register instead of lwsync, though. -Scott
On Thu, Aug 13, 2015 at 10:39:19PM -0500, Scott Wood wrote: > On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote: > > I didn't find anything unusual. But I think we do need to order the > > load/store of esel_next when acquire/release tcd lock. For acquire, > > add a data dependency to order the loads of lock and esel_next. > > For release, even there already have a "isync" here, but it doesn't > > guarantee any memory access order. So we still need "lwsync" for > > the two stores for lock and esel_next. > > I was going to say that esel_next is just a hint and it doesn't really matter > if we occasionally get the wrong value, unless it happens often enough to > cause more performance degradation than the lwsync causes. However, with the > A-008139 workaround we do need to read the same value from esel_next both > times. It might be less costly to save/restore an additional register > instead of lwsync, though. I will try to get some benchmark number to compare which method is a bit better. Do you have any recommended benchmark for a case this is? Thanks, Kevin > > -Scott >
On Fri, 2015-08-14 at 15:13 +0800, Kevin Hao wrote: > On Thu, Aug 13, 2015 at 10:39:19PM -0500, Scott Wood wrote: > > On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote: > > > I didn't find anything unusual. But I think we do need to order the > > > load/store of esel_next when acquire/release tcd lock. For acquire, > > > add a data dependency to order the loads of lock and esel_next. > > > For release, even there already have a "isync" here, but it doesn't > > > guarantee any memory access order. So we still need "lwsync" for > > > the two stores for lock and esel_next. > > > > I was going to say that esel_next is just a hint and it doesn't really > > matter > > if we occasionally get the wrong value, unless it happens often enough to > > cause more performance degradation than the lwsync causes. However, with > > the > > A-008139 workaround we do need to read the same value from esel_next both > > times. It might be less costly to save/restore an additional register > > instead of lwsync, though. > > I will try to get some benchmark number to compare which method is a bit > better. > Do you have any recommended benchmark for a case this is? lmbench lat_mem_rd with a stride chosen to maximize TLB misses. For the uncontended case, one instance; for the contended case, two instances, one pinned to each thread of a core. -Scott
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S index e4185581c5a7..964754911987 100644 --- a/arch/powerpc/mm/tlb_low_64e.S +++ b/arch/powerpc/mm/tlb_low_64e.S @@ -334,6 +334,8 @@ BEGIN_FTR_SECTION /* CPU_FTR_SMT */ * with tlbilx before overwriting. */ + andi r15,r15,0 /* add a data dependency to order the loards */ + add r11,r11,r15 /* between the lock and esel_next */ lbz r15,TCD_ESEL_NEXT(r11) rlwinm r10,r15,16,0xff0000 oris r10,r10,MAS0_TLBSEL(1)@h @@ -447,6 +449,7 @@ BEGIN_FTR_SECTION beq cr1,1f /* no unlock if lock was recursively grabbed */ li r15,0 isync + lwsync stb r15,0(r11) 1: END_FTR_SECTION_IFSET(CPU_FTR_SMT)
I didn't find anything unusual. But I think we do need to order the load/store of esel_next when acquire/release tcd lock. For acquire, add a data dependency to order the loads of lock and esel_next. For release, even there already have a "isync" here, but it doesn't guarantee any memory access order. So we still need "lwsync" for the two stores for lock and esel_next. Signed-off-by: Kevin Hao <haokexin@gmail.com> --- arch/powerpc/mm/tlb_low_64e.S | 3 +++ 1 file changed, 3 insertions(+)