Message ID | 20221209135609.55159-1-sebastian.huber@embedded-brains.de |
---|---|
State | New |
Headers | show |
Series | gcov: Fix -fprofile-update=atomic | expand |
On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > The code coverage support uses counters to determine which edges in the control > flow graph were executed. If a counter overflows, then the code coverage > information is invalid. Therefore the counter type should be a 64-bit integer. > In multithreaded applications, it is important that the counter increments are > atomic. This is not the case by default. The user can enable atomic counter > increments through the -fprofile-update=atomic and > -fprofile-update=prefer-atomic options. > > If the hardware supports 64-bit atomic operations, then everything is fine. If > not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic > counter increments will be used. However, if the hardware does not support the > required atomic operations and -fprofile-atomic=update was chosen by the user, > then a warning was issued and as a forced fall-back to non-atomic operations > was done. This is probably not what a user wants. There is still hardware on > the market which does not have atomic operations and is used for multithreaded > applications. A user which selects -fprofile-update=atomic wants consistent > code coverage data and not random data. > > This patch removes the fall-back to non-atomic operations for > -fprofile-update=atomic. If atomic operations in hardware are not available, > then a library call to libatomic is emitted. To mitigate potential performance > issues an optimization for systems which only support 32-bit atomic operations > is provided. Here, the edge counter increments are done like this: > > low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED); > high_inc = low == 0 ? 1 : 0; > __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED); You check for compare_and_swapsi and the old code checks TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte gcov_type is used. But you do not seem to handle the case where neither SImode nor DImode atomic operations are available? Can we instead do if (gcov_type_size == 4) can_support_atomic4 = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi; else if (gcov_type_size == 8) can_support_atomic8 = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi; if (flag_profile_update == PROFILE_UPDATE_ATOMIC && !can_support_atomic4 && !can_support_atomic8) { warning (0, "target does not support atomic profile update, " "single mode is selected"); flag_profile_update = PROFILE_UPDATE_SINGLE; } thus retain the diagnostic & fallback for this case? The existing code also suggests there might be targets with HImode or TImode counters? In another mail you mentioned that for gen_time_profiler this doesn't work but its code relies on flag_profile_update as well. So do we need to split the flag somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when we are doing more than just edge profiling? Thanks, Richard. > gcc/ChangeLog: > > * tree-profile.cc (split_atomic_increment): New. > (gimple_gen_edge_profiler): Split the atomic edge counter increment in > two 32-bit atomic operations if necessary. > (tree_profiling): Remove profile update warning and fall-back. Set > split_atomic_increment if necessary. > --- > gcc/tree-profile.cc | 81 +++++++++++++++++++++++++++++++++------------ > 1 file changed, 59 insertions(+), 22 deletions(-) > > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc > index 2beb49241f2..1d326dde59a 100644 > --- a/gcc/tree-profile.cc > +++ b/gcc/tree-profile.cc > @@ -73,6 +73,17 @@ static GTY(()) tree ic_tuple_var; > static GTY(()) tree ic_tuple_counters_field; > static GTY(()) tree ic_tuple_callee_field; > > +/* If the user selected atomic profile counter updates > + (-fprofile-update=atomic), then the counter updates will be done atomically. > + Ideally, this is done through atomic operations in hardware. If the > + hardware supports only 32-bit atomic increments and gcov_type_node is a > + 64-bit integer type, then for the profile edge counters the increment is > + performed through two separate 32-bit atomic increments. This case is > + indicated by the split_atomic_increment variable begin true. If the > + hardware does not support atomic operations at all, then a library call to > + libatomic is emitted. */ > +static bool split_atomic_increment; > + > /* Do initialization work for the edge profiler. */ > > /* Add code: > @@ -242,30 +253,59 @@ gimple_init_gcov_profiler (void) > void > gimple_gen_edge_profiler (int edgeno, edge e) > { > - tree one; > - > - one = build_int_cst (gcov_type_node, 1); > + const char *name = "PROF_edge_counter"; > + tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno); > + tree one = build_int_cst (gcov_type_node, 1); > > if (flag_profile_update == PROFILE_UPDATE_ATOMIC) > { > - /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ > - tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); > - tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 > - ? BUILT_IN_ATOMIC_FETCH_ADD_8: > - BUILT_IN_ATOMIC_FETCH_ADD_4); > - gcall *stmt = gimple_build_call (f, 3, addr, one, > - build_int_cst (integer_type_node, > - MEMMODEL_RELAXED)); > - gsi_insert_on_edge (e, stmt); > + tree addr = build_fold_addr_expr (ref); > + tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED); > + if (!split_atomic_increment) > + { > + /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ > + tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 > + ? BUILT_IN_ATOMIC_FETCH_ADD_8: > + BUILT_IN_ATOMIC_FETCH_ADD_4); > + gcall *stmt = gimple_build_call (f, 3, addr, one, relaxed); > + gsi_insert_on_edge (e, stmt); > + } > + else > + { > + /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED); > + high_inc = low == 0 ? 1 : 0; > + __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */ > + tree zero32 = build_zero_cst (uint32_type_node); > + tree one32 = build_one_cst (uint32_type_node); > + tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name); > + gimple *stmt = gimple_build_assign (addr_high, POINTER_PLUS_EXPR, > + addr, > + build_int_cst (size_type_node, > + 4)); > + gsi_insert_on_edge (e, stmt); > + if (WORDS_BIG_ENDIAN) > + std::swap (addr, addr_high); > + tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4); > + stmt = gimple_build_call (f, 3, addr, one, relaxed); > + tree low = make_temp_ssa_name (uint32_type_node, NULL, name); > + gimple_call_set_lhs (stmt, low); > + gsi_insert_on_edge (e, stmt); > + tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name); > + stmt = gimple_build_assign (is_zero, EQ_EXPR, low, zero32); > + gsi_insert_on_edge (e, stmt); > + tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name); > + stmt = gimple_build_assign (high_inc, COND_EXPR, is_zero, one32, > + zero32); > + gsi_insert_on_edge (e, stmt); > + stmt = gimple_build_call (f, 3, addr_high, high_inc, relaxed); > + gsi_insert_on_edge (e, stmt); > + } > } > else > { > - tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno); > - tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, > - NULL, "PROF_edge_counter"); > + tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name); > gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref); > - gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, > - NULL, "PROF_edge_counter"); > + gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name); > gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR, > gimple_assign_lhs (stmt1), one); > gassign *stmt3 = gimple_build_assign (unshare_expr (ref), > @@ -710,11 +750,8 @@ tree_profiling (void) > > if (flag_profile_update == PROFILE_UPDATE_ATOMIC > && !can_support_atomic) > - { > - warning (0, "target does not support atomic profile update, " > - "single mode is selected"); > - flag_profile_update = PROFILE_UPDATE_SINGLE; > - } > + split_atomic_increment = HAVE_sync_compare_and_swapsi > + || HAVE_atomic_compare_and_swapsi; > else if (flag_profile_update == PROFILE_UPDATE_PREFER_ATOMIC) > flag_profile_update = can_support_atomic > ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE; > -- > 2.35.3 >
On 13/12/2022 15:30, Richard Biener wrote: > On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber > <sebastian.huber@embedded-brains.de> wrote: >> The code coverage support uses counters to determine which edges in the control >> flow graph were executed. If a counter overflows, then the code coverage >> information is invalid. Therefore the counter type should be a 64-bit integer. >> In multithreaded applications, it is important that the counter increments are >> atomic. This is not the case by default. The user can enable atomic counter >> increments through the -fprofile-update=atomic and >> -fprofile-update=prefer-atomic options. >> >> If the hardware supports 64-bit atomic operations, then everything is fine. If >> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic >> counter increments will be used. However, if the hardware does not support the >> required atomic operations and -fprofile-atomic=update was chosen by the user, >> then a warning was issued and as a forced fall-back to non-atomic operations >> was done. This is probably not what a user wants. There is still hardware on >> the market which does not have atomic operations and is used for multithreaded >> applications. A user which selects -fprofile-update=atomic wants consistent >> code coverage data and not random data. >> >> This patch removes the fall-back to non-atomic operations for >> -fprofile-update=atomic. If atomic operations in hardware are not available, >> then a library call to libatomic is emitted. To mitigate potential performance >> issues an optimization for systems which only support 32-bit atomic operations >> is provided. Here, the edge counter increments are done like this: >> >> low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED); >> high_inc = low == 0 ? 1 : 0; >> __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED); > You check for compare_and_swapsi and the old code checks > TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte > gcov_type is used. But you do not seem to handle the case where > neither SImode nor DImode atomic operations are available? Can we instead > do > > if (gcov_type_size == 4) > can_support_atomic4 > = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi; > else if (gcov_type_size == 8) > can_support_atomic8 > = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi; > > if (flag_profile_update == PROFILE_UPDATE_ATOMIC > && !can_support_atomic4 && !can_support_atomic8) > { > warning (0, "target does not support atomic profile update, " > "single mode is selected"); > flag_profile_update = PROFILE_UPDATE_SINGLE; > } > > thus retain the diagnostic & fallback for this case? No, if you select -fprofile-update=atomic, then the updates shall be atomic from my point of view. If a fallback is acceptable, then you can use -fprofile-update=prefer-atomic. Using the fallback in -fprofile-update=atomic is a bug which prevents the use of gcov for multi-threaded applications on the lower end targets which do not have atomic operations in hardware. > The existing > code also suggests > there might be targets with HImode or TImode counters? Sorry, I don't know. > > In another mail you mentioned that for gen_time_profiler this doesn't > work but its > code relies on flag_profile_update as well. So do we need to split the flag > somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when > we are doing more than just edge profiling? If atomic operations are not available in hardware, then with this patch calls to libatomic are emitted. In gen_time_profiler() this is not an issue at all, since the atomic increment is only done if counters[0] == 0 (if I read the code correctly). For example for void f(void) {} we get on riscv: gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic lui a4,%hi(__gcov0.f) li a3,1 addi a4,a4,%lo(__gcov0.f) amoadd.w a5,a3,0(a4) lui a4,%hi(__gcov0.f+4) addi a5,a5,1 seqz a5,a5 addi a4,a4,%lo(__gcov0.f+4) amoadd.w zero,a5,0(a4) ret gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -mbig-endian lui a4,%hi(__gcov0.f+4) li a3,1 addi a4,a4,%lo(__gcov0.f+4) amoadd.w a5,a3,0(a4) lui a4,%hi(__gcov0.f) addi a5,a5,1 seqz a5,a5 addi a4,a4,%lo(__gcov0.f) amoadd.w zero,a5,0(a4) ret gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv64g -mabi=lp64 lui a5,%hi(__gcov0.f) li a4,1 addi a5,a5,%lo(__gcov0.f) amoadd.d zero,a4,0(a5) ret gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv32id lui a0,%hi(__gcov0.f) li a3,0 li a1,1 li a2,0 addi a0,a0,%lo(__gcov0.f) tail __atomic_fetch_add_8 gcc --coverage -O2 -S -o - test.c -fprofile-update=prefer-atomic -march=rv32id lui a4,%hi(__gcov0.f) lw a5,%lo(__gcov0.f)(a4) lw a2,%lo(__gcov0.f+4)(a4) addi a3,a5,1 sltu a5,a3,a5 add a5,a5,a2 sw a3,%lo(__gcov0.f)(a4) sw a5,%lo(__gcov0.f+4)(a4) ret
On Thu, Dec 15, 2022 at 9:34 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 13/12/2022 15:30, Richard Biener wrote: > > On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber > > <sebastian.huber@embedded-brains.de> wrote: > >> The code coverage support uses counters to determine which edges in the control > >> flow graph were executed. If a counter overflows, then the code coverage > >> information is invalid. Therefore the counter type should be a 64-bit integer. > >> In multithreaded applications, it is important that the counter increments are > >> atomic. This is not the case by default. The user can enable atomic counter > >> increments through the -fprofile-update=atomic and > >> -fprofile-update=prefer-atomic options. > >> > >> If the hardware supports 64-bit atomic operations, then everything is fine. If > >> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic > >> counter increments will be used. However, if the hardware does not support the > >> required atomic operations and -fprofile-atomic=update was chosen by the user, > >> then a warning was issued and as a forced fall-back to non-atomic operations > >> was done. This is probably not what a user wants. There is still hardware on > >> the market which does not have atomic operations and is used for multithreaded > >> applications. A user which selects -fprofile-update=atomic wants consistent > >> code coverage data and not random data. > >> > >> This patch removes the fall-back to non-atomic operations for > >> -fprofile-update=atomic. If atomic operations in hardware are not available, > >> then a library call to libatomic is emitted. To mitigate potential performance > >> issues an optimization for systems which only support 32-bit atomic operations > >> is provided. Here, the edge counter increments are done like this: > >> > >> low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED); > >> high_inc = low == 0 ? 1 : 0; > >> __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED); > > You check for compare_and_swapsi and the old code checks > > TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte > > gcov_type is used. But you do not seem to handle the case where > > neither SImode nor DImode atomic operations are available? Can we instead > > do > > > > if (gcov_type_size == 4) > > can_support_atomic4 > > = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi; > > else if (gcov_type_size == 8) > > can_support_atomic8 > > = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi; > > > > if (flag_profile_update == PROFILE_UPDATE_ATOMIC > > && !can_support_atomic4 && !can_support_atomic8) > > { > > warning (0, "target does not support atomic profile update, " > > "single mode is selected"); > > flag_profile_update = PROFILE_UPDATE_SINGLE; > > } > > > > thus retain the diagnostic & fallback for this case? > > No, if you select -fprofile-update=atomic, then the updates shall be > atomic from my point of view. If a fallback is acceptable, then you can > use -fprofile-update=prefer-atomic. Using the fallback in > -fprofile-update=atomic is a bug which prevents the use of gcov for > multi-threaded applications on the lower end targets which do not have > atomic operations in hardware. Ah, OK. So the fallback there is libatomic calls as well then? Note not all targets support libatomic, for those the failure mode is likely a link error (which might be fine, but we eventually might want to amend documentation to indicate this failure mode). > > The existing > > code also suggests > > there might be targets with HImode or TImode counters? > > Sorry, I don't know. can you conditionalize the split_atomic_increment init on gcov_type_size == 8 please? The patch is missing adjusments to the -fprofile-update documentation. I think the prefer-atomic vs. atomic needs more explanation with this change. otherwise the patch looks OK to me. Thanks, Richard. > > > > In another mail you mentioned that for gen_time_profiler this doesn't > > work but its > > code relies on flag_profile_update as well. So do we need to split the flag > > somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when > > we are doing more than just edge profiling? > > If atomic operations are not available in hardware, then with this patch > calls to libatomic are emitted. In gen_time_profiler() this is not an > issue at all, since the atomic increment is only done if counters[0] == > 0 (if I read the code correctly). > > For example for > > void f(void) {} > > we get on riscv: > > gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic > > lui a4,%hi(__gcov0.f) > li a3,1 > addi a4,a4,%lo(__gcov0.f) > amoadd.w a5,a3,0(a4) > lui a4,%hi(__gcov0.f+4) > addi a5,a5,1 > seqz a5,a5 > addi a4,a4,%lo(__gcov0.f+4) > amoadd.w zero,a5,0(a4) > ret > > gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -mbig-endian > > lui a4,%hi(__gcov0.f+4) > li a3,1 > addi a4,a4,%lo(__gcov0.f+4) > amoadd.w a5,a3,0(a4) > lui a4,%hi(__gcov0.f) > addi a5,a5,1 > seqz a5,a5 > addi a4,a4,%lo(__gcov0.f) > amoadd.w zero,a5,0(a4) > ret > > gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv64g > -mabi=lp64 > > lui a5,%hi(__gcov0.f) > li a4,1 > addi a5,a5,%lo(__gcov0.f) > amoadd.d zero,a4,0(a5) > ret > > gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv32id > > lui a0,%hi(__gcov0.f) > li a3,0 > li a1,1 > li a2,0 > addi a0,a0,%lo(__gcov0.f) > tail __atomic_fetch_add_8 > > gcc --coverage -O2 -S -o - test.c -fprofile-update=prefer-atomic > -march=rv32id > > lui a4,%hi(__gcov0.f) > lw a5,%lo(__gcov0.f)(a4) > lw a2,%lo(__gcov0.f+4)(a4) > addi a3,a5,1 > sltu a5,a3,a5 > add a5,a5,a2 > sw a3,%lo(__gcov0.f)(a4) > sw a5,%lo(__gcov0.f+4)(a4) > ret > > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/
On 16.12.22 10:47, Richard Biener wrote: >> No, if you select -fprofile-update=atomic, then the updates shall be >> atomic from my point of view. If a fallback is acceptable, then you can >> use -fprofile-update=prefer-atomic. Using the fallback in >> -fprofile-update=atomic is a bug which prevents the use of gcov for >> multi-threaded applications on the lower end targets which do not have >> atomic operations in hardware. > Ah, OK. So the fallback there is libatomic calls as well then? Note > not all targets support libatomic, for those the failure mode is likely > a link error (which might be fine, but we eventually might want to > amend documentation to indicate this failure mode). It seems these library calls caused issues in the past: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378 One option could be to emit calls to a new libgcov function: __gcov_inc_counter(counter) -> updated value This function could use a __gthread_mutex_t mutex for updates. This ends up probably with quite a bad performance.
On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber <sebastian.huber@embedded-brains.de> wrote: > > On 16.12.22 10:47, Richard Biener wrote: > >> No, if you select -fprofile-update=atomic, then the updates shall be > >> atomic from my point of view. If a fallback is acceptable, then you can > >> use -fprofile-update=prefer-atomic. Using the fallback in > >> -fprofile-update=atomic is a bug which prevents the use of gcov for > >> multi-threaded applications on the lower end targets which do not have > >> atomic operations in hardware. > > Ah, OK. So the fallback there is libatomic calls as well then? Note > > not all targets support libatomic, for those the failure mode is likely > > a link error (which might be fine, but we eventually might want to > > amend documentation to indicate this failure mode). > > It seems these library calls caused issues in the past: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466 > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378 Hmm, those are testsuite-isms in some way but of course users would run into the same issue, needing explicit -latomic (where available). I suppose target specs could automatically add that for -fprofile-update=atomic but this would need to be specified at link time as well then. > One option could be to emit calls to a new libgcov function: > > __gcov_inc_counter(counter) -> updated value > > This function could use a __gthread_mutex_t mutex for updates. This ends > up probably with quite a bad performance. But that's eventually what libatomic will do as well as fallback. I don't have a good idea here. Do you have to explicitely link -latomic on RISCV? Richard. > -- > embedded brains GmbH > Herr Sebastian HUBER > Dornierstr. 4 > 82178 Puchheim > Germany > email: sebastian.huber@embedded-brains.de > phone: +49-89-18 94 741 - 16 > fax: +49-89-18 94 741 - 08 > > Registergericht: Amtsgericht München > Registernummer: HRB 157899 > Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler > Unsere Datenschutzerklärung finden Sie hier: > https://embedded-brains.de/datenschutzerklaerung/
On 16.12.22 13:09, Richard Biener wrote: > On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber > <sebastian.huber@embedded-brains.de> wrote: >> On 16.12.22 10:47, Richard Biener wrote: >>>> No, if you select -fprofile-update=atomic, then the updates shall be >>>> atomic from my point of view. If a fallback is acceptable, then you can >>>> use -fprofile-update=prefer-atomic. Using the fallback in >>>> -fprofile-update=atomic is a bug which prevents the use of gcov for >>>> multi-threaded applications on the lower end targets which do not have >>>> atomic operations in hardware. >>> Ah, OK. So the fallback there is libatomic calls as well then? Note >>> not all targets support libatomic, for those the failure mode is likely >>> a link error (which might be fine, but we eventually might want to >>> amend documentation to indicate this failure mode). >> It seems these library calls caused issues in the past: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466 >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378 > Hmm, those are testsuite-isms in some way but of course > users would run into the same issue, needing explicit > -latomic (where available). I suppose target specs could > automatically add that for -fprofile-update=atomic but this > would need to be specified at link time as well then. Why can't we provide libatomic for all targets? Is gthr.h not always available? > >> One option could be to emit calls to a new libgcov function: >> >> __gcov_inc_counter(counter) -> updated value >> >> This function could use a __gthread_mutex_t mutex for updates. This ends >> up probably with quite a bad performance. > But that's eventually what libatomic will do as well as fallback. For libatomic, hosts can implement a protect_start() and protect_end() function. On RTEMS, this is implemented like this: #include <machine/_libatomic.h> static inline UWORD protect_start (void *ptr) { return _Libatomic_Protect_start (ptr); } static inline void protect_end (void *ptr, UWORD isr_level) { _Libatomic_Protect_end (ptr, isr_level); } On single-core systems, this is mapped to interrupt disable/enable. This is quite efficient (compared to a mutex). > > I don't have a good idea here. > > Do you have to explicitely link -latomic on RISCV? For RTEMS, libatomic is always added to the linker command line: gcc/config/rtems.h:"%{qrtems:--start-group -lrtemsbsp -lrtemscpu -latomic -lc -lgcc --end-group}" For riscv, there seems to be also a linux special case: gcc/config/riscv/linux.h: " %{pthread:" LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION "}" gcc/config/riscv/linux.h:#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC " -latomic "
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc index 2beb49241f2..1d326dde59a 100644 --- a/gcc/tree-profile.cc +++ b/gcc/tree-profile.cc @@ -73,6 +73,17 @@ static GTY(()) tree ic_tuple_var; static GTY(()) tree ic_tuple_counters_field; static GTY(()) tree ic_tuple_callee_field; +/* If the user selected atomic profile counter updates + (-fprofile-update=atomic), then the counter updates will be done atomically. + Ideally, this is done through atomic operations in hardware. If the + hardware supports only 32-bit atomic increments and gcov_type_node is a + 64-bit integer type, then for the profile edge counters the increment is + performed through two separate 32-bit atomic increments. This case is + indicated by the split_atomic_increment variable begin true. If the + hardware does not support atomic operations at all, then a library call to + libatomic is emitted. */ +static bool split_atomic_increment; + /* Do initialization work for the edge profiler. */ /* Add code: @@ -242,30 +253,59 @@ gimple_init_gcov_profiler (void) void gimple_gen_edge_profiler (int edgeno, edge e) { - tree one; - - one = build_int_cst (gcov_type_node, 1); + const char *name = "PROF_edge_counter"; + tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno); + tree one = build_int_cst (gcov_type_node, 1); if (flag_profile_update == PROFILE_UPDATE_ATOMIC) { - /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ - tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno); - tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 - ? BUILT_IN_ATOMIC_FETCH_ADD_8: - BUILT_IN_ATOMIC_FETCH_ADD_4); - gcall *stmt = gimple_build_call (f, 3, addr, one, - build_int_cst (integer_type_node, - MEMMODEL_RELAXED)); - gsi_insert_on_edge (e, stmt); + tree addr = build_fold_addr_expr (ref); + tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED); + if (!split_atomic_increment) + { + /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ + tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32 + ? BUILT_IN_ATOMIC_FETCH_ADD_8: + BUILT_IN_ATOMIC_FETCH_ADD_4); + gcall *stmt = gimple_build_call (f, 3, addr, one, relaxed); + gsi_insert_on_edge (e, stmt); + } + else + { + /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED); + high_inc = low == 0 ? 1 : 0; + __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */ + tree zero32 = build_zero_cst (uint32_type_node); + tree one32 = build_one_cst (uint32_type_node); + tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name); + gimple *stmt = gimple_build_assign (addr_high, POINTER_PLUS_EXPR, + addr, + build_int_cst (size_type_node, + 4)); + gsi_insert_on_edge (e, stmt); + if (WORDS_BIG_ENDIAN) + std::swap (addr, addr_high); + tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4); + stmt = gimple_build_call (f, 3, addr, one, relaxed); + tree low = make_temp_ssa_name (uint32_type_node, NULL, name); + gimple_call_set_lhs (stmt, low); + gsi_insert_on_edge (e, stmt); + tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name); + stmt = gimple_build_assign (is_zero, EQ_EXPR, low, zero32); + gsi_insert_on_edge (e, stmt); + tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name); + stmt = gimple_build_assign (high_inc, COND_EXPR, is_zero, one32, + zero32); + gsi_insert_on_edge (e, stmt); + stmt = gimple_build_call (f, 3, addr_high, high_inc, relaxed); + gsi_insert_on_edge (e, stmt); + } } else { - tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno); - tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, - NULL, "PROF_edge_counter"); + tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name); gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref); - gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, - NULL, "PROF_edge_counter"); + gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name); gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR, gimple_assign_lhs (stmt1), one); gassign *stmt3 = gimple_build_assign (unshare_expr (ref), @@ -710,11 +750,8 @@ tree_profiling (void) if (flag_profile_update == PROFILE_UPDATE_ATOMIC && !can_support_atomic) - { - warning (0, "target does not support atomic profile update, " - "single mode is selected"); - flag_profile_update = PROFILE_UPDATE_SINGLE; - } + split_atomic_increment = HAVE_sync_compare_and_swapsi + || HAVE_atomic_compare_and_swapsi; else if (flag_profile_update == PROFILE_UPDATE_PREFER_ATOMIC) flag_profile_update = can_support_atomic ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;