Message ID | 20120621105505.GF20973@bubble.grove.modra.org |
---|---|
State | New |
Headers | show |
On Thu, Jun 21, 2012 at 6:55 AM, Alan Modra <amodra@gmail.com> wrote: > A couple of small tweaks to PowerPc atomic operations. The first > omits the "cmp; bc; isync" barrier on atomic_load with mem model > __ATOMIC_CONSUME. PowerPC pointer loads don't need a barrier. Ref > http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html > As best I can see, mem_thread_fence should not be changed similarly, > since __ATOMIC_CONSUME doesn't really make sense on a fence. So a > fence with __ATOMIC_CONSUME ought to behave as __ATOMIC_ACQUIRE. > > The second tweak forces the address used by load_locked and > store_conditional to a reg when the address is not legitimate for > those instructions, saving reload some work, reducing register > pressure and sometimes code size. Not a big deal, just something I > noticed a while ago when looking at libgomp. eg. (-original, +patched) > Bootstrapped and regression tested powerpc64-linux. OK for mainline? > > * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier): Pass in and > return mem. Convert to indirect addressing if not indirect or > indexed. Adjust all callers. The second patch is okay. The first patch is controversial. Richard, Paul McKenney and I discussed the meaning of load(MEMMODEL_CONSUME). Let me quote part of Paul's comments: What is required from memory_order_consume loads is that it meet the following constraints: 1) "Load tearing" is prohibited. In other words, the implementation must load the value "in one go" or, alternatively, prevent stores from running concurrently with a multi-instruction load. A multi-instruction load would have to be used for large data structures that are declared "atomic". 2) Similarly, "store tearing" is prohibited. In other words, the implementation must store the value "in one go" or, alternatively, prevent loads from running concurrently with a multi-instruction store. 3) The implementation is prohibiting from inventing a load or store not explicitly coded by the programmer. In contrast, an implementation might invent loads and stores for non-atomic variables in order to optimize register usage. 4) Later loads and stores to which a dependency is carried from the initial load (see 1.10p9 and 1.10p10 in the standard) cannot be reordered to precede the load(memory_order_consume). Other loads and stores following the load(memory_order_consume) may be freely reordered to precede the load(memory_order_consume). A relaxed load is subject to constraints #1-#3 above, but not #4. In contrast, a memory_order_acquire load would be subject to #1-3 above, but would be subject to a stronger version of #4 that prohibited -any- load or store following the load(memory_order_consume) to be reordered prior to that load(memory_order_consume). The isync is not technically required, but removing it requires subtle load-store tracking conformance from the compiler and library. We left it in to be safe. I'm curious about Richard and Jakub's current thoughts. - David
On Thu, Jun 21, 2012 at 12:08:42PM -0400, David Edelsohn wrote: > 4) Later loads and stores to which a dependency is carried from the > initial load (see 1.10p9 and 1.10p10 in the standard) cannot be > reordered to precede the load(memory_order_consume). Other loads and > stores following the load(memory_order_consume) may be freely > reordered to precede the load(memory_order_consume). > > A relaxed load is subject to constraints #1-#3 above, but not #4. In > contrast, a memory_order_acquire load would be subject to #1-3 above, > but would be subject to a stronger version of #4 that prohibited -any- > load or store following the load(memory_order_consume) to be reordered > prior to that load(memory_order_consume). > > > The isync is not technically required, but removing it requires subtle > load-store tracking conformance from the compiler and library. We > left it in to be safe. I'm curious about Richard and Jakub's current > thoughts. Hmm, OK. So, let me see if I understand. We know p = atomic_load_n (addr, __ATOMIC_CONSUME); i = *p; needs no barriers on powerpc. But if we had something like p = atomic_load_n (addr, __ATOMIC_CONSUME); if (p == &foo) { i = *p; ... } and the compiler optimised "i = *p" to "i = foo", then you'd be in trouble because the hardware won't see the load from "foo" having any dependency on p.
Index: gcc/config/rs6000/sync.md =================================================================== --- gcc/config/rs6000/sync.md (revision 188723) +++ gcc/config/rs6000/sync.md (working copy) @@ -126,8 +126,8 @@ switch (model) { case MEMMODEL_RELAXED: + case MEMMODEL_CONSUME: break; - case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: case MEMMODEL_SEQ_CST: emit_insn (gen_loadsync (operands[0])); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 188723) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -16527,9 +16572,19 @@ emit_store_conditional (enum machine_mode mode, rt /* Expand barriers before and after a load_locked/store_cond sequence. */ -static void -rs6000_pre_atomic_barrier (enum memmodel model) +static rtx +rs6000_pre_atomic_barrier (rtx mem, enum memmodel model) { + rtx addr = XEXP (mem, 0); + int strict_p = (reload_in_progress || reload_completed); + + if (!legitimate_indirect_address_p (addr, strict_p) + && !legitimate_indexed_address_p (addr, strict_p)) + { + addr = force_reg (Pmode, addr); + mem = replace_equiv_address_nv (mem, addr); + } + switch (model) { case MEMMODEL_RELAXED: @@ -16546,6 +16601,7 @@ emit_store_conditional (enum machine_mode mode, rt default: gcc_unreachable (); } + return mem; } static void @@ -16684,7 +16740,7 @@ rs6000_expand_atomic_compare_and_swap (rtx operand else if (reg_overlap_mentioned_p (retval, oldval)) oldval = copy_to_reg (oldval); - rs6000_pre_atomic_barrier (mod_s); + mem = rs6000_pre_atomic_barrier (mem, mod_s); label1 = NULL_RTX; if (!is_weak) @@ -16769,7 +16825,7 @@ rs6000_expand_atomic_exchange (rtx operands[]) mode = SImode; } - rs6000_pre_atomic_barrier (model); + mem = rs6000_pre_atomic_barrier (mem, model); label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); emit_label (XEXP (label, 0)); @@ -16853,7 +16909,7 @@ rs6000_expand_atomic_op (enum rtx_code code, rtx m mode = SImode; } - rs6000_pre_atomic_barrier (model); + mem = rs6000_pre_atomic_barrier (mem, model); label = gen_label_rtx (); emit_label (label);