Message ID | 4C41DD32.60503@codesourcery.com |
---|---|
State | New |
Headers | show |
On 07/17/10 10:41, Maxim Kuvyrkov wrote: > On 7/16/10 10:37 PM, Jeff Law wrote: >> On 07/15/10 13:22, Maxim Kuvyrkov wrote: >>> On 7/15/10 8:06 PM, Jeff Law wrote: >>> ... >>>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of >>>> trapping_expr from antloc or transp. >>> >>> I missed that point that we should avoid emitting anything at the ends >>> of basic blocks with abnormal edges. >>> >>> Is the attached patch OK? >> Yes as long as it passes the usual bootstrap & regression test. > > Well, after the regtest I now know the exact purpose of transpout. It > is to avoid moving memory references across calls that can clobber > memory. Without the checks done in compute_transpout a memory > reference can be moved along an abnormal edge and, when that happens, > it gets placed /before/ the call. Placement before the call is intentional. The CALL_INSN (if it has abnormal edges) has to be considered to change control flow. If the hoisted insn is placed after the CALL_INSN, the hoisted insn may not be executed if the CALL_INSN throws (for example). insert_insn_end_basic_block contains the logic to determine where in a block to place the new insn. I *think* PRE avoids this by only inserting in a block where it knows the inserted insn is safe to place anywhere within the block. In the case of a MEM, we must have already determined that the block (and thus any CALL_INSN within the block) doesn't clobber the MEM. Hoisting is a little different in that it only verifies that it should be safe to place the MEM at the end of block. Just so I'm certain I understand the issue. Is the problem that the CALL_INSN is clobbering the value in the MEM, which normally wouldn't be a problem except that we can (in certain cases) place the MEM before the CALL_INSN. Right? One could argue this is a failing of hoist_expr_reaches_here_p which fails to take the corner cases of inserting at the end of a block into account. Looking at insert_insn_end_basic_block we have another class of problems. Say the target block ends with a set & jump insn. The destination of the set might clobber a value needed by an insn hoisted to the end of the block. What a mess we've stumbled into. While logically it makes more sense to fix these issues in hoist_expr_reaches_here_p, I'm not sure I want to force you to open that can-o-worms right now. > Similarly, we want to avoid hoisting potentially trapping expressions > along abnormal edges as that would allow the trap to happen prematurely. Agreed. > > Jeff, I'm sorry this is taking so long; would you please review this > incarnation of the patch. No worries. Sometimes what looks like a simple change turns into a major hairball. It's just part of the development process. > > Bootstrapped and tested on i686-linux (biarch) and arm-linux. Looks pretty good. > + if (!pre_p && MEM_P (e->expr)) > + /* Note memory references that can be clobbered by a call. > + We do not split abnormal edges in HOIST, so would > + a memory reference get hoisted along an abnormal edge, > + it would be placed /before/ the call. Therefore, only > + constant memory references can be hoisted along abnormal > + edges. */ > + { > + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF > + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0))) > + continue; This is the only part I'm struggling with. It looks like you're trying to avoid setting prune_exprs if the MEM's address has certain properties (such as a reference to static memory). However, if my analysis of the problem is correct, that's not sufficient to solve the problem. Plus there's probably better ways to detect static memory references. jeff
On 7/19/10 10:08 PM, Jeff Law wrote: > On 07/17/10 10:41, Maxim Kuvyrkov wrote: >> On 7/16/10 10:37 PM, Jeff Law wrote: >>> On 07/15/10 13:22, Maxim Kuvyrkov wrote: >>>> On 7/15/10 8:06 PM, Jeff Law wrote: >>>> ... >>>>> ? See the e->flags & EDGE_ABNORMAL test prior to removing elements of >>>>> trapping_expr from antloc or transp. >>>> >>>> I missed that point that we should avoid emitting anything at the ends >>>> of basic blocks with abnormal edges. >>>> >>>> Is the attached patch OK? >>> Yes as long as it passes the usual bootstrap & regression test. >> >> Well, after the regtest I now know the exact purpose of transpout. It >> is to avoid moving memory references across calls that can clobber >> memory. Without the checks done in compute_transpout a memory >> reference can be moved along an abnormal edge and, when that happens, >> it gets placed /before/ the call. > Placement before the call is intentional. Certainly. > The CALL_INSN (if it has > abnormal edges) has to be considered to change control flow. If the > hoisted insn is placed after the CALL_INSN, the hoisted insn may not be > executed if the CALL_INSN throws (for example). > insert_insn_end_basic_block contains the logic to determine where in a > block to place the new insn. > > I *think* PRE avoids this by only inserting in a block where it knows > the inserted insn is safe to place anywhere within the block. In the > case of a MEM, we must have already determined that the block (and thus > any CALL_INSN within the block) doesn't clobber the MEM. Hoisting is a > little different in that it only verifies that it should be safe to > place the MEM at the end of block. Agreed. > > Just so I'm certain I understand the issue. Is the problem that the > CALL_INSN is clobbering the value in the MEM, which normally wouldn't be > a problem except that we can (in certain cases) place the MEM before the > CALL_INSN. Right? Correct. > One could argue this is a failing of > hoist_expr_reaches_here_p which fails to take the corner cases of > inserting at the end of a block into account. While that's technically correct, I see hoist_expr_reaches_here_p as designed to perform it's job using antloc, transp, and other data sets, i.e., without examining the instruction stream. It seems simpler to handle corner cases like this one by cleaning up the data sets earlier in the pass. > > Looking at insert_insn_end_basic_block we have another class of > problems. Say the target block ends with a set & jump insn. The > destination of the set might clobber a value needed by an insn hoisted > to the end of the block. What a mess we've stumbled into. In theory, "yes" it may happen that a set and/or jump insns modify memory or have other interesting side effects. In practice, we don't seem to have this problem with any of the ports. >> + if (!pre_p && MEM_P (e->expr)) >> + /* Note memory references that can be clobbered by a call. >> + We do not split abnormal edges in HOIST, so would >> + a memory reference get hoisted along an abnormal edge, >> + it would be placed /before/ the call. Therefore, only >> + constant memory references can be hoisted along abnormal >> + edges. */ >> + { >> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF >> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0))) >> + continue; > > This is the only part I'm struggling with. It looks like you're trying > to avoid setting prune_exprs if the MEM's address has certain properties > (such as a reference to static memory). However, if my analysis of the > problem is correct, that's not sufficient to solve the problem. This code comes straight from compute_transpout. Why do you think these checks aren't sufficient? A memory reference to a constant pool is, well, a constant. It may be that the second check if (MEM_READONLY_P (e->expr) && !MEM_VOLATILE_P (e->expr) && MEM_NOTRAP_P (e->expr)) /* Constant memory reference, e.g., a PIC address. */ continue; implies CONSTANT_POOL_ADDRESS_P, but I prefer to err on the side of caution have both checks in place. Thanks,
On 7/19/10 10:39 PM, Maxim Kuvyrkov wrote: > On 7/19/10 10:08 PM, Jeff Law wrote: Ping. >>> + if (!pre_p && MEM_P (e->expr)) >>> + /* Note memory references that can be clobbered by a call. >>> + We do not split abnormal edges in HOIST, so would >>> + a memory reference get hoisted along an abnormal edge, >>> + it would be placed /before/ the call. Therefore, only >>> + constant memory references can be hoisted along abnormal >>> + edges. */ >>> + { >>> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF >>> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0))) >>> + continue; >> >> This is the only part I'm struggling with. It looks like you're trying >> to avoid setting prune_exprs if the MEM's address has certain properties >> (such as a reference to static memory). However, if my analysis of the >> problem is correct, that's not sufficient to solve the problem. > > This code comes straight from compute_transpout. > > Why do you think these checks aren't sufficient? A memory reference to a > constant pool is, well, a constant. > > It may be that the second check > > if (MEM_READONLY_P (e->expr) > && !MEM_VOLATILE_P (e->expr) > && MEM_NOTRAP_P (e->expr)) > /* Constant memory reference, e.g., a PIC address. */ > continue; > > implies CONSTANT_POOL_ADDRESS_P, but I prefer to err on the side of > caution have both checks in place.
On 07/19/10 12:39, Maxim Kuvyrkov wrote: > > >> >> Just so I'm certain I understand the issue. Is the problem that the >> CALL_INSN is clobbering the value in the MEM, which normally wouldn't be >> a problem except that we can (in certain cases) place the MEM before the >> CALL_INSN. Right? > > Correct. > >> One could argue this is a failing of >> hoist_expr_reaches_here_p which fails to take the corner cases of >> inserting at the end of a block into account. > > While that's technically correct, I see hoist_expr_reaches_here_p as > designed to perform it's job using antloc, transp, and other data > sets, i.e., without examining the instruction stream. It seems > simpler to handle corner cases like this one by cleaning up the data > sets earlier in the pass. But you don't know the target block for the insertion when you're pruning the data sets. Thus you don't know if the target block has any of the odd situations which require insertion earlier in the block. hoist_expr_reaches_here_p is supposed to determine if an expression, if inserted at the "end" of a specific block reaches a later point in the cfg unchanged. Since we know the target block of the insertion we can check for the various odd conditions that require insertion before the actual end of the target block. > >> >> Looking at insert_insn_end_basic_block we have another class of >> problems. Say the target block ends with a set & jump insn. The >> destination of the set might clobber a value needed by an insn hoisted >> to the end of the block. What a mess we've stumbled into. > > In theory, "yes" it may happen that a set and/or jump insns modify > memory or have other interesting side effects. In practice, we don't > seem to have this problem with any of the ports. The fact that we haven't stumbled across this bug doesn't mean the problem does not exist. I think we've largely been lucky because we don't run code hoisting unless optimizing for size and because the combiner is the pass most likely to create these insns and the combiner runs after gcse. I'm not saying you have to address this problem right now (though a comment discussing the issue would be greatly appreciated). > >>> + if (!pre_p && MEM_P (e->expr)) >>> + /* Note memory references that can be clobbered by a call. >>> + We do not split abnormal edges in HOIST, so would >>> + a memory reference get hoisted along an abnormal edge, >>> + it would be placed /before/ the call. Therefore, only >>> + constant memory references can be hoisted along abnormal >>> + edges. */ >>> + { >>> + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF >>> + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0))) >>> + continue; >> >> This is the only part I'm struggling with. It looks like you're trying >> to avoid setting prune_exprs if the MEM's address has certain properties >> (such as a reference to static memory). However, if my analysis of the >> problem is correct, that's not sufficient to solve the problem. > > This code comes straight from compute_transpout. I mis-understood what the code was checking for -- it's dealing with MEMs which are known to be constant and thus can't be clobbered by the call, which clearly isn't a problem. My bad. I think at this point we should go forward with your patch and if you could add a pair of comments to expr_reaches_here_p and the pruning code which touch on the problems with set-and-jump insns as a separate patch, that would be greatly appreciated. Thanks for your patience, Jeff
On 7/27/10 10:07 PM, Jeff Law wrote: > On 07/19/10 12:39, Maxim Kuvyrkov wrote: >> While that's technically correct, I see hoist_expr_reaches_here_p as >> designed to perform it's job using antloc, transp, and other data >> sets, i.e., without examining the instruction stream. It seems simpler >> to handle corner cases like this one by cleaning up the data sets >> earlier in the pass. > But you don't know the target block for the insertion when you're > pruning the data sets. Thus you don't know if the target block has any > of the odd situations which require insertion earlier in the block. I think the fact that there is an "odd" instruction on the path from expression's initial location to the target makes it impossible to hoist it to the target block. Even if the target block is "normal" and instructions can be added immediately at its end the "odd" instruction will clobber the expression. > hoist_expr_reaches_here_p is supposed to determine if an expression, if > inserted at the "end" of a specific block reaches a later point in the > cfg unchanged. Since we know the target block of the insertion we can > check for the various odd conditions that require insertion before the > actual end of the target block. In general, I agree. I just cannot readily come up with a case that would be under-optimized by pruning data sets compared to hoist_expr_reaches_here_p. Anyway, that's, probably, moot at this point. > I think at this point we should go forward with your patch and if you > could add a pair of comments to expr_reaches_here_p and the pruning code > which touch on the problems with set-and-jump insns as a separate patch, > that would be greatly appreciated. Thanks for review!
diff --git a/gcc/gcse.c b/gcc/gcse.c index e506d47..0d80fb2 100644 --- a/gcc/gcse.c +++ b/gcc/gcse.c @@ -468,7 +468,6 @@ static void mark_oprs_set (rtx); static void alloc_cprop_mem (int, int); static void free_cprop_mem (void); static void compute_transp (const_rtx, int, sbitmap *, int); -static void compute_transpout (void); static void compute_local_properties (sbitmap *, sbitmap *, sbitmap *, struct hash_table_d *); static void compute_cprop_data (void); @@ -3172,11 +3171,6 @@ bypass_conditional_jumps (void) /* Nonzero for expressions that are transparent in the block. */ static sbitmap *transp; -/* Nonzero for expressions that are transparent at the end of the block. - This is only zero for expressions killed by abnormal critical edge - created by a calls. */ -static sbitmap *transpout; - /* Nonzero for expressions that are computed (available) in the block. */ static sbitmap *comp; @@ -3240,33 +3234,59 @@ free_pre_mem (void) pre_optimal = pre_redundant = pre_insert_map = pre_delete_map = NULL; } -/* Top level routine to do the dataflow analysis needed by PRE. */ +/* Remove certain expressions from anticipatable and transparent + sets of basic blocks that have incoming abnormal edge. + For PRE remove potentially trapping expressions to avoid placing + them on abnormal edges. For HOIST remove memory references that + can be clobbered by calls. */ static void -compute_pre_data (void) +prune_expressions (bool pre_p) { - sbitmap trapping_expr; - basic_block bb; + sbitmap prune_exprs; unsigned int ui; + basic_block bb; - compute_local_properties (transp, comp, antloc, &expr_hash_table); - sbitmap_vector_zero (ae_kill, last_basic_block); - - /* Collect expressions which might trap. */ - trapping_expr = sbitmap_alloc (expr_hash_table.n_elems); - sbitmap_zero (trapping_expr); + prune_exprs = sbitmap_alloc (expr_hash_table.n_elems); + sbitmap_zero (prune_exprs); for (ui = 0; ui < expr_hash_table.size; ui++) { struct expr *e; for (e = expr_hash_table.table[ui]; e != NULL; e = e->next_same_hash) - if (may_trap_p (e->expr)) - SET_BIT (trapping_expr, e->bitmap_index); - } + { + /* Note potentially trapping expressions. */ + if (may_trap_p (e->expr)) + { + SET_BIT (prune_exprs, e->bitmap_index); + continue; + } - /* Compute ae_kill for each basic block using: + if (!pre_p && MEM_P (e->expr)) + /* Note memory references that can be clobbered by a call. + We do not split abnormal edges in HOIST, so would + a memory reference get hoisted along an abnormal edge, + it would be placed /before/ the call. Therefore, only + constant memory references can be hoisted along abnormal + edges. */ + { + if (GET_CODE (XEXP (e->expr, 0)) == SYMBOL_REF + && CONSTANT_POOL_ADDRESS_P (XEXP (e->expr, 0))) + continue; - ~(TRANSP | COMP) - */ + if (MEM_READONLY_P (e->expr) + && !MEM_VOLATILE_P (e->expr) + && MEM_NOTRAP_P (e->expr)) + /* Constant memory reference, e.g., a PIC address. */ + continue; + + /* ??? Optimally, we would use interprocedural alias + analysis to determine if this mem is actually killed + by this call. */ + + SET_BIT (prune_exprs, e->bitmap_index); + } + } + } FOR_EACH_BB (bb) { @@ -3274,17 +3294,43 @@ compute_pre_data (void) edge_iterator ei; /* If the current block is the destination of an abnormal edge, we - kill all trapping expressions because we won't be able to properly - place the instruction on the edge. So make them neither - anticipatable nor transparent. This is fairly conservative. */ + kill all trapping (for PRE) or memory (for HOIST) expressions + because we won't be able to properly place the instruction on + the edge. So make them neither anticipatable nor transparent. + This is fairly conservative. */ FOR_EACH_EDGE (e, ei, bb->preds) - if (e->flags & EDGE_ABNORMAL) + if ((e->flags & EDGE_ABNORMAL) + && (pre_p || CALL_P (BB_END (e->src)))) { - sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr); - sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr); + sbitmap_difference (antloc[bb->index], + antloc[bb->index], prune_exprs); + sbitmap_difference (transp[bb->index], + transp[bb->index], prune_exprs); break; } + } + + sbitmap_free (prune_exprs); +} + +/* Top level routine to do the dataflow analysis needed by PRE. */ +static void +compute_pre_data (void) +{ + basic_block bb; + + compute_local_properties (transp, comp, antloc, &expr_hash_table); + prune_expressions (true); + sbitmap_vector_zero (ae_kill, last_basic_block); + + /* Compute ae_kill for each basic block using: + + ~(TRANSP | COMP) + */ + + FOR_EACH_BB (bb) + { sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]); sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]); } @@ -3295,7 +3341,6 @@ compute_pre_data (void) antloc = NULL; sbitmap_vector_free (ae_kill); ae_kill = NULL; - sbitmap_free (trapping_expr); } /* PRE utilities */ @@ -4050,52 +4095,6 @@ add_label_notes (rtx x, rtx insn) } } -/* Compute transparent outgoing information for each block. - - An expression is transparent to an edge unless it is killed by - the edge itself. This can only happen with abnormal control flow, - when the edge is traversed through a call. This happens with - non-local labels and exceptions. - - This would not be necessary if we split the edge. While this is - normally impossible for abnormal critical edges, with some effort - it should be possible with exception handling, since we still have - control over which handler should be invoked. But due to increased - EH table sizes, this may not be worthwhile. */ - -static void -compute_transpout (void) -{ - basic_block bb; - unsigned int i; - struct expr *expr; - - sbitmap_vector_ones (transpout, last_basic_block); - - FOR_EACH_BB (bb) - { - /* Note that flow inserted a nop at the end of basic blocks that - end in call instructions for reasons other than abnormal - control flow. */ - if (! CALL_P (BB_END (bb))) - continue; - - for (i = 0; i < expr_hash_table.size; i++) - for (expr = expr_hash_table.table[i]; expr ; expr = expr->next_same_hash) - if (MEM_P (expr->expr)) - { - if (GET_CODE (XEXP (expr->expr, 0)) == SYMBOL_REF - && CONSTANT_POOL_ADDRESS_P (XEXP (expr->expr, 0))) - continue; - - /* ??? Optimally, we would use interprocedural alias - analysis to determine if this mem is actually killed - by this call. */ - RESET_BIT (transpout[bb->index], expr->bitmap_index); - } - } -} - /* Code Hoisting variables and subroutines. */ /* Very busy expressions. */ @@ -4124,7 +4123,6 @@ alloc_code_hoist_mem (int n_blocks, int n_exprs) hoist_vbein = sbitmap_vector_alloc (n_blocks, n_exprs); hoist_vbeout = sbitmap_vector_alloc (n_blocks, n_exprs); hoist_exprs = sbitmap_vector_alloc (n_blocks, n_exprs); - transpout = sbitmap_vector_alloc (n_blocks, n_exprs); } /* Free vars used for code hoisting analysis. */ @@ -4139,7 +4137,6 @@ free_code_hoist_mem (void) sbitmap_vector_free (hoist_vbein); sbitmap_vector_free (hoist_vbeout); sbitmap_vector_free (hoist_exprs); - sbitmap_vector_free (transpout); free_dominance_info (CDI_DOMINATORS); } @@ -4192,7 +4189,7 @@ static void compute_code_hoist_data (void) { compute_local_properties (transp, comp, antloc, &expr_hash_table); - compute_transpout (); + prune_expressions (false); compute_code_hoist_vbeinout (); calculate_dominance_info (CDI_DOMINATORS); if (dump_file) @@ -4294,8 +4291,7 @@ hoist_code (void) { int hoistable = 0; - if (TEST_BIT (hoist_vbeout[bb->index], i) - && TEST_BIT (transpout[bb->index], i)) + if (TEST_BIT (hoist_vbeout[bb->index], i)) { /* We've found a potentially hoistable expression, now we look at every block BB dominates to see if it