Message ID | 4D8B8051.2030307@redhat.com |
---|---|
State | New |
Headers | show |
On 03/24/2011 10:33 AM, Aldy Hernandez wrote: > In the example below we usually hoist "global" into a register or > temporary to avoid reading from it at each step. This would cause a > race if another thread had modified "global" in between iterations. > > for (x=0; x< 5; x++) > sum[x] = global; Um, what? Doesn't the c++ memory model have, like, sequence points or somesuch verbage that includes some language like an "atomic"? Your argument above, in absence of some serializing entity, does not sound right at all. r~
On 03/24/11 14:58, Richard Henderson wrote: > On 03/24/2011 10:33 AM, Aldy Hernandez wrote: >> In the example below we usually hoist "global" into a register or >> temporary to avoid reading from it at each step. This would cause a >> race if another thread had modified "global" in between iterations. >> >> for (x=0; x< 5; x++) >> sum[x] = global; > > Um, what? Doesn't the c++ memory model have, like, sequence points > or somesuch verbage that includes some language like an "atomic"? > > Your argument above, in absence of some serializing entity, does not > sound right at all. This work is independent of the C++ memory model. It is just to prevent the optimizers from introducing new data races due to code movement. This initial pass is just to make the optimizations data race safe so that if you have a program which is proven to be free of data races, you can run the optimizers and it will still be data race free after the optimizers have been run. See the following summary by Andrew (which in turn is based on a paper by Hans Boehm): http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces The code movement is disallowed under specific --param options and will later be used for the C++ memory model, though the kernel folk have been asking for something similar, so it's not C++ specific. Does this make sense now? Or were Andrew and I smoking some heavy duty stuff when reading the specs... and not sharing with you? Aldy
On Thu, 24 Mar 2011, Aldy Hernandez wrote: > This work is independent of the C++ memory model. It is just to prevent the > optimizers from introducing new data races due to code movement. This initial > pass is just to make the optimizations data race safe so that if you have a > program which is proven to be free of data races, you can run the optimizers > and it will still be data race free after the optimizers have been run. > > See the following summary by Andrew (which in turn is based on a paper by Hans > Boehm): > > http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces But hoisting global in this case doesn't result in a data race, since the loop always accesses global and contains no synchronisation code. If it were only accessed conditionally, as in the examples under "Avoiding Speculation" on that page, then there would be a race in hoisting it, but not for the code you gave; any data races with the hoisting would still have been present without it.
> On Thu, 24 Mar 2011, Aldy Hernandez wrote: > >> This work is independent of the C++ memory model. It is just to prevent the >> optimizers from introducing new data races due to code movement. This initial >> pass is just to make the optimizations data race safe so that if you have a >> program which is proven to be free of data races, you can run the optimizers >> and it will still be data race free after the optimizers have been run. >> >> See the following summary by Andrew (which in turn is based on a paper by Hans >> Boehm): >> >> http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces > But hoisting global in this case doesn't result in a data race, since the > loop always accesses global and contains no synchronisation code. If it > were only accessed conditionally, as in the examples under "Avoiding > Speculation" on that page, then there would be a race in hoisting it, but > not for the code you gave; any data races with the hoisting would still > have been present without it. > My fault for not being specific about it... I tend to just use data race as a catch all for all these types of things when talking about them with Aldy. the testcase should have for (x=0; x< limit; x++) sum[x] = global; rather than x<5, so that its not a known loop count. The hoisted load is then not allowed as it become a speculative load of 'global' on the main path which would not otherwise have been there. When the value of limit < 0, this can cause a load race detection on systems with either a software or hardware data race detector. It can be allowed as long as the load happens inside a guard for the loop, but I dont think we are that sophisticated yet. Bottom line is these flags are to prevent the introduction of loads of globals on code paths which didn't have had them before. Andrew
On Fri, Mar 25, 2011 at 12:57 AM, Andrew MacLeod <amacleod@redhat.com> wrote: >> On Thu, 24 Mar 2011, Aldy Hernandez wrote: >> >>> This work is independent of the C++ memory model. It is just to prevent >>> the >>> optimizers from introducing new data races due to code movement. This >>> initial >>> pass is just to make the optimizations data race safe so that if you have >>> a >>> program which is proven to be free of data races, you can run the >>> optimizers >>> and it will still be data race free after the optimizers have been run. >>> >>> See the following summary by Andrew (which in turn is based on a paper by >>> Hans >>> Boehm): >>> >>> http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces >> >> But hoisting global in this case doesn't result in a data race, since the >> loop always accesses global and contains no synchronisation code. If it >> were only accessed conditionally, as in the examples under "Avoiding >> Speculation" on that page, then there would be a race in hoisting it, but >> not for the code you gave; any data races with the hoisting would still >> have been present without it. >> > My fault for not being specific about it... I tend to just use data race as > a catch all for all these types of things when talking about them with Aldy. > > the testcase should have for (x=0; x< limit; x++) sum[x] = global; > rather than x<5, so that its not a known loop count. > > The hoisted load is then not allowed as it become a speculative load of > 'global' on the main path which would not otherwise have been there. When > the value of limit < 0, this can cause a load race detection on systems with > either a software or hardware data race detector. But speculative loads are never a problem. So I'd like to avoid cluttering GCC code with stuff to avoid them. I honestly don't care about diagnostic tools that fail to see if a value read is used or not. Richard. > It can be allowed as long as the load happens inside a guard for the loop, > but I dont think we are that sophisticated yet. > > Bottom line is these flags are to prevent the introduction of loads of > globals on code paths which didn't have had them before. > > Andrew > > >
>>> But hoisting global in this case doesn't result in a data race, since the >>> loop always accesses global and contains no synchronisation code. If it >>> were only accessed conditionally, as in the examples under "Avoiding >>> Speculation" on that page, then there would be a race in hoisting it, but >>> not for the code you gave; any data races with the hoisting would still >>> have been present without it. >>> >> My fault for not being specific about it... I tend to just use data race as >> a catch all for all these types of things when talking about them with Aldy. Arghh... I took Andrew's testcase unmodified, and assumed it did not conform to the model. Sorry for the confusion. >> the testcase should have for (x=0; x< limit; x++) sum[x] = global; >> rather than x<5, so that its not a known loop count. And in this variant, we no longer have a data race. The load from global is guarded by the limit check. > But speculative loads are never a problem. So I'd like to avoid cluttering > GCC code with stuff to avoid them. I honestly don't care about diagnostic > tools that fail to see if a value read is used or not. Frankly, I agree. The standard (and the aforementioned paper) says that load data races are disallowed... "If we had hypothetical hardware (or a virtual machine) that aborted the program when a race was detected, compilers would not be allowed to introduce potentially racing loads... even if the result of the load were unused would change the semantics of the program. Although such environments are not commonplace, they have certainly been proposed, and may well turn out to be very useful." ...but, if there's no hardware for it, I see no need for spending time on this. Do y'all agree, so I can drop this witch hunt? In which case, unfortunately, all the tests I had from Andrew (that actually trigger) are load data races. So I'll have to hunt for invalid speculative stores instead. Back to the drawing board. Thanks folks.
Hi, On Fri, 25 Mar 2011, Aldy Hernandez wrote: > > But speculative loads are never a problem. So I'd like to avoid > > cluttering GCC code with stuff to avoid them. I honestly don't care > > about diagnostic tools that fail to see if a value read is used or > > not. > > Frankly, I agree. The standard (and the aforementioned paper) says that > load data races are disallowed... > > "If we had hypothetical hardware (or a virtual machine) that aborted the > program when a race was detected, compilers would not be allowed to introduce > potentially racing loads... even if the result of the load were unused would > change the semantics of the program. Although such environments are not > commonplace, they have certainly been proposed, and may well turn out to be > very useful." > > ...but, if there's no hardware for it, I see no need for spending time on > this. Even if there was such (IMO useless) hardware, or somebody would waste his time in writing such (equally useless) virtual machine that can detect fabricated problems somebody invented for some standard that never are going to be problems in the real world, we shouldn't feel obliged to uglify GCC for that. (OTOH my opinion about the new c++ memory model is known ;) ) Ciao, Michael.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/25/11 09:20, Michael Matz wrote: > Even if there was such (IMO useless) hardware, or somebody would waste his > time in writing such (equally useless) virtual machine that can detect > fabricated problems somebody invented for some standard that never are > going to be problems in the real world, we shouldn't feel obliged to > uglify GCC for that. > > (OTOH my opinion about the new c++ memory model is known ;) ) I'm not going to chime in on this specific problem; however, it is worth noting that many of the issues raised by the C++0x memory model also affect the linux kernel. In fact, it was the realization that the kernel guys are fighting closely related issues with data races that bumped the priority of the memory model work to a level that we (Red Hat) felt it was necessary to start pushing these issues upstream now. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNjLUOAAoJEBRtltQi2kC7FTkIAKtk5XF1auo5IPtpYJ1G8Yo8 EjW0a8hMcWFM0lWVt6+t9qStBcSCT35W9XYtWf4EmX+2HX6Hs3+KJIOoKXboqM0j oA05YuQHHGqX2Br9jqlmocfB3E0qW+aCJLi3hu/nJWlMTUbOlf5BXF9sc0Q0Veio oGwF8aMoXG1IXQJ5nq4SU03n6lrBWn7x/4vyQCesdEMzzeOL2/LN0i2FBdZkXXNj xK8+R5uKni/pO5V/mKbm4l0ExRLmgO2iyxiTQO/jGlwfS1EnR4Zj2NiWHlXgWH2B o55zGtfy1xNSONGdTAKwFHk3ShvmatRvZZxX5A+M3NlPBpbF0CHL1la+XaSZkZ8= =ZlUI -----END PGP SIGNATURE-----
On Fri, Mar 25, 2011 at 09:30:22AM -0600, Jeff Law wrote: > I'm not going to chime in on this specific problem; however, it is worth > noting that many of the issues raised by the C++0x memory model also > affect the linux kernel. But what they are seeing are certainly store data races, not load races, because no hw they care about (or no hw at all?) detects the latter. Having options to avoid store data races is useful not just for C++0x memory model compliance and Linux kernel, but e.g. for OpenMP too. > In fact, it was the realization that the kernel guys are fighting > closely related issues with data races that bumped the priority of the > memory model work to a level that we (Red Hat) felt it was necessary to > start pushing these issues upstream now. Jakub
On Fri, Mar 25, 2011 at 4:33 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Mar 25, 2011 at 09:30:22AM -0600, Jeff Law wrote: >> I'm not going to chime in on this specific problem; however, it is worth >> noting that many of the issues raised by the C++0x memory model also >> affect the linux kernel. > > But what they are seeing are certainly store data races, not load races, > because no hw they care about (or no hw at all?) detects the latter. > Having options to avoid store data races is useful not just for C++0x memory > model compliance and Linux kernel, but e.g. for OpenMP too. And we have in the past removed code that created store data races anyway. There is nothing new here. As stated multiple times in the past things get "interesting" when you look at non-x86 hardware. Richard.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/24/11 17:57, Andrew MacLeod wrote: > My fault for not being specific about it... I tend to just use data race > as a catch all for all these types of things when talking about them > with Aldy. > > the testcase should have for (x=0; x< limit; x++) sum[x] = global; > rather than x<5, so that its not a known loop count. > > The hoisted load is then not allowed as it become a speculative load of > 'global' on the main path which would not otherwise have been there. > When the value of limit < 0, this can cause a load race detection on > systems with either a software or hardware data race detector. We generally don't allow this kind of code motion anyway -- though it's more because we haven't built any kind of analysis to determine that this kind of speculative load is often profitable and can't result in a fault. However, having a switch we can throw to avoid this behaviour (assuming we add this kind of speculative code motion one day) is, IMHO, a good thing. Jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNlHzgAAoJEBRtltQi2kC7h24H/1TJ9gdSKfPwjtiKQL7BLraO yAwvJJJihacO3xM2pkVZIXRVlHwFyqzU03x0ygExMOex3QMuLMByBbpxsk3uQV7t 30xF5MDEdXEmQ7L1jWI7gYXbKZk5lnJhDQKewPePdUbcVzJj3H71SQGP2uWCLMUE 7ToPioN2w3DazoWdnYC/Jqi5JwBLnKpcxkGXQPYbCvqII8W5Xkmlh743Zq5RF7ax q1nAkRmKdsgIq0SJJNF0GVxdzfOQkt1q7yzR7W74PCG1h6Yam6DNsePSbswDSu46 0gxkqnlGb6dkgx1iCHAS0PtNDAurGVCAOd5rijH1yib7C+P48XLmoF10Y13v82I= =D+1J -----END PGP SIGNATURE-----
On Thu, Mar 31, 2011 at 3:08 PM, Jeff Law <law@redhat.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 03/24/11 17:57, Andrew MacLeod wrote: >> My fault for not being specific about it... I tend to just use data race >> as a catch all for all these types of things when talking about them >> with Aldy. >> >> the testcase should have for (x=0; x< limit; x++) sum[x] = global; >> rather than x<5, so that its not a known loop count. >> >> The hoisted load is then not allowed as it become a speculative load of >> 'global' on the main path which would not otherwise have been there. >> When the value of limit < 0, this can cause a load race detection on >> systems with either a software or hardware data race detector. > We generally don't allow this kind of code motion anyway -- though it's > more because we haven't built any kind of analysis to determine that > this kind of speculative load is often profitable and can't result in a > fault. > > However, having a switch we can throw to avoid this behaviour (assuming > we add this kind of speculative code motion one day) is, IMHO, a good thing. If we know the access to global cannot trap I see no reason to disallow speculative reading of it. Richard. > Jeff > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJNlHzgAAoJEBRtltQi2kC7h24H/1TJ9gdSKfPwjtiKQL7BLraO > yAwvJJJihacO3xM2pkVZIXRVlHwFyqzU03x0ygExMOex3QMuLMByBbpxsk3uQV7t > 30xF5MDEdXEmQ7L1jWI7gYXbKZk5lnJhDQKewPePdUbcVzJj3H71SQGP2uWCLMUE > 7ToPioN2w3DazoWdnYC/Jqi5JwBLnKpcxkGXQPYbCvqII8W5Xkmlh743Zq5RF7ax > q1nAkRmKdsgIq0SJJNF0GVxdzfOQkt1q7yzR7W74PCG1h6Yam6DNsePSbswDSu46 > 0gxkqnlGb6dkgx1iCHAS0PtNDAurGVCAOd5rijH1yib7C+P48XLmoF10Y13v82I= > =D+1J > -----END PGP SIGNATURE----- >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/31/11 07:26, Richard Guenther wrote: > On Thu, Mar 31, 2011 at 3:08 PM, Jeff Law <law@redhat.com> wrote: > On 03/24/11 17:57, Andrew MacLeod wrote: >>>> My fault for not being specific about it... I tend to just use data race >>>> as a catch all for all these types of things when talking about them >>>> with Aldy. >>>> >>>> the testcase should have for (x=0; x< limit; x++) sum[x] = global; >>>> rather than x<5, so that its not a known loop count. >>>> >>>> The hoisted load is then not allowed as it become a speculative load of >>>> 'global' on the main path which would not otherwise have been there. >>>> When the value of limit < 0, this can cause a load race detection on >>>> systems with either a software or hardware data race detector. > We generally don't allow this kind of code motion anyway -- though it's > more because we haven't built any kind of analysis to determine that > this kind of speculative load is often profitable and can't result in a > fault. > > However, having a switch we can throw to avoid this behaviour (assuming > we add this kind of speculative code motion one day) is, IMHO, a good thing. > >> If we know the access to global cannot trap I see no reason to disallow >> speculative reading of it. Without trip count estimates the speculative read can introduce a performance regression. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNlIFYAAoJEBRtltQi2kC7+PsH/jRDrhp30ZgRcKctox4/4w46 GBPWsizmDttIg9INfWejX5xvD6wj++qKwBEjD4+5CiDPnjVg6A+5Kr4NrfyQoINx m/kJHmSlO70jHTkfux24tA+Psyj+eyxEFIsXtntaW06MN0J4umEUceJj1SKygi4Q qVfzR9q8gqDKZSEaNzDDpuZh8ANGt2Te4aAK27zd1aWImqjNdXJquESnr0i9Wyqo Q6539rpfemCmSY5dusu0IGUoEWmKqk7EpfJPSsfZgb5E4FgnDMUCclHcNeMQatAC aDmTVeP/AAtd+9ILwVyu0/j38Es6v5kMUXIRJxhMtjLAIQ6DTSajM7IUmuOs4Lg= =TF6O -----END PGP SIGNATURE-----
Hi, On Thu, 31 Mar 2011, Jeff Law wrote: > Without trip count estimates the speculative read can introduce a > performance regression. Like any other transformation. So what are you trying to say? That therefore introducing a mode in GCC disallowing any speculative reads (even if they happen to improve performance!?) is a bright idea? I sure hope not. The right way to deal with performance concerns is to model them and (try to) avoid them. Not to introduce random restrictions onto a (timing independend) as-if model. So, please let's leave out any disguising arguments about performance in connection with the cxx mem-model, shall we? It has different goals AFAIU, and if it has to resort to performance for showing its usefulness it's bound to fail already. Ciao, Michael.
Index: tree-ssa-loop-im.c =================================================================== --- tree-ssa-loop-im.c (revision 170745) +++ tree-ssa-loop-im.c (working copy) @@ -377,6 +377,11 @@ movement_possibility (gimple stmt) || stmt_could_throw_p (stmt)) return MOVE_IMPOSSIBLE; + /* Do not move loads of globals if under a restrictive memory model. */ + if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0 + && gimple_has_thread_visible_ops (stmt)) + return MOVE_IMPOSSIBLE; + if (is_gimple_call (stmt)) { /* While pure or const call is guaranteed to have no side effects, we Index: tree.h =================================================================== --- tree.h (revision 170745) +++ tree.h (working copy) @@ -3102,6 +3102,10 @@ struct GTY(()) tree_parm_decl { #define DECL_THREAD_LOCAL_P(NODE) \ (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL) +/* Return true if a VAR_DECL is visible from another thread. */ +#define DECL_THREAD_VISIBLE_P(NODE) \ + (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE)) + /* In a non-local VAR_DECL with static storage duration, true if the variable has an initialization priority. If false, the variable will be initialized at the DEFAULT_INIT_PRIORITY. */ Index: cse.c =================================================================== --- cse.c (revision 170745) +++ cse.c (working copy) @@ -2402,6 +2402,19 @@ hash_rtx_cb (const_rtx x, enum machine_m } case MEM: + /* Avoid hoisting loads of globals if under a restrictive memory + model. */ + if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0) + { + tree decl = MEM_EXPR (x); + if (do_not_record_p + && decl && TREE_CODE (decl) == VAR_DECL + && DECL_THREAD_VISIBLE_P (decl)) + { + *do_not_record_p = 1; + return 0; + } + } /* We don't record if marked volatile or if BLKmode since we don't know the size of the move. */ if (do_not_record_p && (MEM_VOLATILE_P (x) || GET_MODE (x) == BLKmode)) Index: cselib.c =================================================================== --- cselib.c (revision 170745) +++ cselib.c (working copy) @@ -1190,6 +1190,16 @@ cselib_lookup_mem (rtx x, int create) cselib_val *mem_elt; struct elt_list *l; + /* Forget any reads from globals if under a restrictive memory + model. */ + if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0) + { + tree decl = MEM_EXPR (x); + if (decl && TREE_CODE (decl) == VAR_DECL + && DECL_THREAD_VISIBLE_P (decl)) + return 0; + } + if (MEM_VOLATILE_P (x) || mode == BLKmode || !cselib_record_memory || (FLOAT_MODE_P (mode) && flag_float_store)) Index: tree-ssa-pre.c =================================================================== --- tree-ssa-pre.c (revision 170745) +++ tree-ssa-pre.c (working copy) @@ -4000,6 +4000,12 @@ compute_avail (void) || stmt_could_throw_p (stmt)) continue; + /* Do not move loads of globals if under a restrictive + memory model. */ + if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0 + && gimple_has_thread_visible_ops (stmt)) + continue; + switch (gimple_code (stmt)) { case GIMPLE_RETURN: Index: gimple.c =================================================================== --- gimple.c (revision 170745) +++ gimple.c (working copy) @@ -2388,6 +2388,36 @@ gimple_rhs_has_side_effects (const_gimpl return false; } +/* Return true if any of the operands in S are visible from another + thread (globals that are not TLS. */ +/* ?? If this becomes a performance penalty, we should cache this + value as we do with volatiles. */ + +bool +gimple_has_thread_visible_ops (const_gimple s) +{ + unsigned int i; + + if (is_gimple_debug (s)) + return false; + + if (is_gimple_assign (s)) + { + /* Skip the LHS. */ + i = 1; + } + else + i = 0; + for (; i < gimple_num_ops (s); i++) + { + tree op = gimple_op (s, i); + if (op && TREE_CODE (op) == VAR_DECL + && DECL_THREAD_VISIBLE_P (op)) + return true; + } + return false; +} + /* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p. Return true if S can trap. When INCLUDE_MEM is true, check whether the memory operations could trap. When INCLUDE_STORES is true and Index: gimple.h =================================================================== --- gimple.h (revision 170745) +++ gimple.h (working copy) @@ -882,6 +882,7 @@ gimple gimple_build_cond_from_tree (tree void gimple_cond_set_condition_from_tree (gimple, tree); bool gimple_has_side_effects (const_gimple); bool gimple_rhs_has_side_effects (const_gimple); +bool gimple_has_thread_visible_ops (const_gimple); bool gimple_could_trap_p (gimple); bool gimple_could_trap_p_1 (gimple, bool, bool); bool gimple_assign_rhs_could_trap_p (gimple);