diff mbox

[RFC] two-phase marking in gt_cleare_cache

Message ID 559A42E9.6000303@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 6, 2015, 8:57 a.m. UTC
Hi,

Using attached untested patch, I managed to minimize a test-case failure 
for PR 66714.

The patch introduces two-phase marking in gt_cleare_cache:
- first phase, it loops over all the hash table entries and removes
   those which are dead
- second phase, it runs over all the live hash table entries and marks
   live items that are reachable from those live entries

By doing so, we make the behaviour of gt_cleare_cache independent of the 
order in which the entries are visited, turning:
- hard-to-trigger bugs which trigger for one visiting order but not for
   another, into
- more easily triggered bugs which trigger for any visiting order.

Any comments?

Thanks,
- Tom

Comments

Richard Biener July 6, 2015, 1:25 p.m. UTC | #1
On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Hi,
>
> Using attached untested patch, I managed to minimize a test-case failure for
> PR 66714.
>
> The patch introduces two-phase marking in gt_cleare_cache:
> - first phase, it loops over all the hash table entries and removes
>   those which are dead
> - second phase, it runs over all the live hash table entries and marks
>   live items that are reachable from those live entries
>
> By doing so, we make the behaviour of gt_cleare_cache independent of the
> order in which the entries are visited, turning:
> - hard-to-trigger bugs which trigger for one visiting order but not for
>   another, into
> - more easily triggered bugs which trigger for any visiting order.
>
> Any comments?

I think it is only half-way correct in your proposed change.  You only
fix the issue for hashes of the same kind.  To truly fix the issue you'd
have to change generated code for gt_clear_caches () and provide
a clearing-only implementation (or pass a operation mode bool to
the core worker in hash-table.h).

Thanks,
Richard.

> Thanks,
> - Tom
Richard Biener July 6, 2015, 1:29 p.m. UTC | #2
On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Hi,
>>
>> Using attached untested patch, I managed to minimize a test-case failure for
>> PR 66714.
>>
>> The patch introduces two-phase marking in gt_cleare_cache:
>> - first phase, it loops over all the hash table entries and removes
>>   those which are dead
>> - second phase, it runs over all the live hash table entries and marks
>>   live items that are reachable from those live entries
>>
>> By doing so, we make the behaviour of gt_cleare_cache independent of the
>> order in which the entries are visited, turning:
>> - hard-to-trigger bugs which trigger for one visiting order but not for
>>   another, into
>> - more easily triggered bugs which trigger for any visiting order.
>>
>> Any comments?
>
> I think it is only half-way correct in your proposed change.  You only
> fix the issue for hashes of the same kind.  To truly fix the issue you'd
> have to change generated code for gt_clear_caches () and provide
> a clearing-only implementation (or pass a operation mode bool to
> the core worker in hash-table.h).

Hmm, and don't we rather want to first mark and _then_ clear?  Because
if entry B in the hash is live and would keep A live then A _is_ kept in the
end but you'll remove it from the hash, possibly no longer using a still
live copy.

Richard.

> Thanks,
> Richard.
>
>> Thanks,
>> - Tom
Tom de Vries July 6, 2015, 5:29 p.m. UTC | #3
On 06/07/15 15:29, Richard Biener wrote:
> On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> Hi,
>>>
>>> Using attached untested patch, I managed to minimize a test-case failure for
>>> PR 66714.
>>>
>>> The patch introduces two-phase marking in gt_cleare_cache:
>>> - first phase, it loops over all the hash table entries and removes
>>>    those which are dead
>>> - second phase, it runs over all the live hash table entries and marks
>>>    live items that are reachable from those live entries
>>>
>>> By doing so, we make the behaviour of gt_cleare_cache independent of the
>>> order in which the entries are visited, turning:
>>> - hard-to-trigger bugs which trigger for one visiting order but not for
>>>    another, into
>>> - more easily triggered bugs which trigger for any visiting order.
>>>
>>> Any comments?
>>
>> I think it is only half-way correct in your proposed change.  You only
>> fix the issue for hashes of the same kind.  To truly fix the issue you'd
>> have to change generated code for gt_clear_caches () and provide
>> a clearing-only implementation (or pass a operation mode bool to
>> the core worker in hash-table.h).
>

[ Btw, we have been discussing a similar issue before: 
https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ]

True, the problem exists at the scope of all variables marked with 
'cache', and this patch addresses the problem only within a single variable.

> Hmm, and don't we rather want to first mark and _then_ clear?

I. In favor of first clear and then mark:

It allows for:
- a lazy one phase implementation for !ENABLE_CHECKING where
   you do a single clear-or-mark phase (so the clear is lazy).
- an eager two phase implementation for ENABLE_CHECKING (where the
   clear is eager)
The approach of first a marking phase and then a clearing phase means 
you always have to do these two phases (you can't do the marking lazily).

First mark and then clear means the marking should be done iteratively. 
Each time you mark something live, another entry in another hash table 
could become live. Marking iteratively could become quite costly.

II. In favor of first mark and then clear:

The users of garbage collection will need to be less precise.

> Because
> if entry B in the hash is live and would keep A live then A _is_ kept in the
> end but you'll remove it from the hash, possibly no longer using a still
> live copy.
>

I'm not sure I understand the scenario you're concerned about, but ... 
say we have
- entry B: item B -> item A
- entry A: item A -> item Z

If you do clear first and mark second, and you start out with item B 
live and item A dead:
- during the clearing phase you clear entry A and keep entry B, and
- during the marking phase you mark item A live.

So we no longer have entry A, but item A is kept and entry B is kept.

Thanks,
- Tom
Richard Biener July 7, 2015, 8:42 a.m. UTC | #4
On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 06/07/15 15:29, Richard Biener wrote:
>>
>> On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries <Tom_deVries@mentor.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Using attached untested patch, I managed to minimize a test-case failure
>>>> for
>>>> PR 66714.
>>>>
>>>> The patch introduces two-phase marking in gt_cleare_cache:
>>>> - first phase, it loops over all the hash table entries and removes
>>>>    those which are dead
>>>> - second phase, it runs over all the live hash table entries and marks
>>>>    live items that are reachable from those live entries
>>>>
>>>> By doing so, we make the behaviour of gt_cleare_cache independent of the
>>>> order in which the entries are visited, turning:
>>>> - hard-to-trigger bugs which trigger for one visiting order but not for
>>>>    another, into
>>>> - more easily triggered bugs which trigger for any visiting order.
>>>>
>>>> Any comments?
>>>
>>>
>>> I think it is only half-way correct in your proposed change.  You only
>>> fix the issue for hashes of the same kind.  To truly fix the issue you'd
>>> have to change generated code for gt_clear_caches () and provide
>>> a clearing-only implementation (or pass a operation mode bool to
>>> the core worker in hash-table.h).
>>
>>
>
> [ Btw, we have been discussing a similar issue before:
> https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ]
>
> True, the problem exists at the scope of all variables marked with 'cache',
> and this patch addresses the problem only within a single variable.
>
>> Hmm, and don't we rather want to first mark and _then_ clear?
>
>
> I. In favor of first clear and then mark:
>
> It allows for:
> - a lazy one phase implementation for !ENABLE_CHECKING where
>   you do a single clear-or-mark phase (so the clear is lazy).
> - an eager two phase implementation for ENABLE_CHECKING (where the
>   clear is eager)
> The approach of first a marking phase and then a clearing phase means you
> always have to do these two phases (you can't do the marking lazily).

True.

> First mark and then clear means the marking should be done iteratively. Each
> time you mark something live, another entry in another hash table could
> become live. Marking iteratively could become quite costly.

I don't see this - marking is done recursively so if one entry makes another
live and that makes another live the usual GC marking recursion will deal
with this?

> II. In favor of first mark and then clear:
>
> The users of garbage collection will need to be less precise.
>
>> Because
>> if entry B in the hash is live and would keep A live then A _is_ kept in
>> the
>> end but you'll remove it from the hash, possibly no longer using a still
>> live copy.
>>
>
> I'm not sure I understand the scenario you're concerned about, but ... say
> we have
> - entry B: item B -> item A
> - entry A: item A -> item Z
>
> If you do clear first and mark second, and you start out with item B live
> and item A dead:
> - during the clearing phase you clear entry A and keep entry B, and
> - during the marking phase you mark item A live.
>
> So we no longer have entry A, but item A is kept and entry B is kept.

Yes.  This makes the cache weaker in that after this GC operation
a lookup of A no longer succeeds but it still is there.

The whole point of your patch was to make the behavior more predictable
and in some way it succeeds (within a cache).  As it is supposed to
put more stress on the cache logic (it's ENABLE_CHECKING only)
it makes sense to clear optimistically (after all it's a cache and not
guaranteed to find a still live entry).  It would be still nice to cover
all caches together because as I remember we've mostly seen issues
of caches interacting.

Richard.

> Thanks,
> - Tom
>
Michael Matz July 7, 2015, 2 p.m. UTC | #5
Hi,

On Mon, 6 Jul 2015, Richard Biener wrote:

> >> By doing so, we make the behaviour of gt_cleare_cache independent of the
> >> order in which the entries are visited, turning:
> >> - hard-to-trigger bugs which trigger for one visiting order but not for
> >>   another, into
> >> - more easily triggered bugs which trigger for any visiting order.
> >>
> >> Any comments?
> >
> > I think it is only half-way correct in your proposed change.  You only
> > fix the issue for hashes of the same kind.  To truly fix the issue you'd
> > have to change generated code for gt_clear_caches () and provide
> > a clearing-only implementation (or pass a operation mode bool to
> > the core worker in hash-table.h).
> 
> Hmm, and don't we rather want to first mark and _then_ clear?  Because 
> if entry B in the hash is live and would keep A live then A _is_ kept in 
> the end but you'll remove it from the hash, possibly no longer using a 
> still live copy.

I don't think such use has ever worked with the caching hash tables.  Even 
the old (before c++) scheme didn't iterate, i.e. if a cache-hash entry A
became life from outside, but it itself kept an entry B live in the hash 
table (with no other references to B) then this never worked (or only by 
luck), because the slot was always cleared if !ggc_marked_p, so if B was 
visited before A it was removed from the hash-table (and in particular B's 
gt_ggc_mx routine was never called, so whatever B needed wasn't even 
marked).

Given this I think the call to gt_ggc_mx is superfluous because it 
wouldn't work relyably for multi-step dependencies anyway.  Hence a 
situation that works with that call in place, and breaking without it is 
actually a bug waiting to be uncovered.


Ciao,
Michael.
diff mbox

Patch

Add checking in gt_cleare_cache

---
 gcc/hash-table.h | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 12e0c96..c2ea112 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -1046,7 +1046,37 @@  gt_cleare_cache (hash_table<H> *h)
   if (!h)
     return;
 
-  for (typename table::iterator iter = h->begin (); iter != h->end (); ++iter)
+  typename table::iterator iter;
+
+#ifdef CHECKING
+  /* Say we have:
+     1. cache entry A, with keep_change_entry (A) == 1, and
+     2. cache entry B, with keep_change_entry (B) == 0, and
+     3. gt_ggc_mx (A) marks things live in such a way that keep_change_entry (B)
+        becomes 1.
+
+     In the loop at the end of the function, if A is visited first, then B is
+     kept.  If B is visited first, it is deleted.
+
+     We don't want the situation that the result of this function is dependent
+     on the order in which the entries are visited, so we consider this
+     situation a bug.
+
+     In order to stabilize the result of the function in presence of the bug, we
+     first clear all entries E with keep_change_entry (E) == 0.  By doing so, we
+     also maximize the impact of the liveness analysis done up until now, which
+     we hope makes it more likely that we run into bugs regarding that analysis.
+     We only do this when checking since it's more expensive.  */
+  for (iter = h->begin (); iter != h->end (); ++iter)
+    if (!table::is_empty (*iter) && !table::is_deleted (*iter))
+      {
+	int res = H::keep_cache_entry (*iter);
+	if (res == 0)
+	  h->clear_slot (&*iter);
+      }
+#endif
+
+  for (iter = h->begin (); iter != h->end (); ++iter)
     if (!table::is_empty (*iter) && !table::is_deleted (*iter))
       {
 	int res = H::keep_cache_entry (*iter);
-- 
1.9.1