Message ID | 4C48B32C.30007@codesourcery.com |
---|---|
State | New |
Headers | show |
> Before you do that, here's a new version. This corrects a few errors in > the register lifetime handling, and adds support for moving across two > basic blocks, which is very useful for switch statements but happens in > other cases as well. This implementation really moves insns whereas cross-jumping, the reversed transformation, is implemented by means of operations on the CFG. Although this is probably not as straightforward in this direction, did you consider the CFG approach instead? Wouldn't it simplify a little the integration in the cfgcleanup.c framework?
On 07/24/2010 12:05 AM, Eric Botcazou wrote: >> Before you do that, here's a new version. This corrects a few errors in >> the register lifetime handling, and adds support for moving across two >> basic blocks, which is very useful for switch statements but happens in >> other cases as well. > > This implementation really moves insns whereas cross-jumping, the reversed > transformation, is implemented by means of operations on the CFG. Although > this is probably not as straightforward in this direction, did you consider > the CFG approach instead? Wouldn't it simplify a little the integration in > the cfgcleanup.c framework? Please be more specific about what you envision. Bernd
> Please be more specific about what you envision.
See try_crossjump_to_edge and try_crossjump_bb: no code is actually moved,
blocks are split and edges redirected instead.
On 07/24/2010 03:07 PM, Eric Botcazou wrote: >> Please be more specific about what you envision. > > See try_crossjump_to_edge and try_crossjump_bb: no code is actually moved, > blocks are split and edges redirected instead. Yeah, but that can't work for this optimization. I don't see how you can do it without moving insns across jumps. Please give an example of how you would transform code. Bernd
On 07/26/2010 11:42 AM, Bernd Schmidt wrote: > On 07/24/2010 03:07 PM, Eric Botcazou wrote: >>> Please be more specific about what you envision. >> >> See try_crossjump_to_edge and try_crossjump_bb: no code is actually moved, >> blocks are split and edges redirected instead. > > Yeah, but that can't work for this optimization. I don't see how you can > do it without moving insns across jumps. Please give an example of how > you would transform code. You could: - split the destination BB before the jump (into BB11 and BB12) - split the source BBs after the last moved instruction (into BB21 and BB22, BB31 and BB32, etc.) - redirect the jumps to BBn1 (n>=2) to go to BBn2. - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 I don't know if this is worth though, it can always be done later. Paolo
On 07/26/2010 03:40 PM, Paolo Bonzini wrote: > You could: > - split the destination BB before the jump (into BB11 and BB12) > - split the source BBs after the last moved instruction (into BB21 and > BB22, BB31 and BB32, etc.) > - redirect the jumps to BBn1 (n>=2) to go to BBn2. > - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 How is this simpler and better than just having a single line calling reorder_insns? It seems pointless given that it produces the same result, with a lot more effort. Bernd
On 07/26/2010 03:49 PM, Bernd Schmidt wrote: > On 07/26/2010 03:40 PM, Paolo Bonzini wrote: >> You could: >> - split the destination BB before the jump (into BB11 and BB12) >> - split the source BBs after the last moved instruction (into BB21 and >> BB22, BB31 and BB32, etc.) >> - redirect the jumps to BBn1 (n>=2) to go to BBn2. >> - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 > > How is this simpler and better than just having a single line calling > reorder_insns? It seems pointless given that it produces the same > result, with a lot more effort. I can't say I disagree (even if you include in the picture deleting the duplicated insns in other basic blocks). Paolo
> How is this simpler and better than just having a single line calling > reorder_insns? It seems pointless given that it produces the same > result, with a lot more effort. It's the canonical way of doing this kind of transformations these days. The underlying machinery is supposed to do all the heavy lifting, you just have to drive it. In particular, I want to avoid kludges like: +/* Set to true if we couldn't run an optimization due to stale liveness + information; we should run df_analyze to enable more opportunities. */ +static bool block_was_dirty; @@ -2182,6 +2449,9 @@ try_optimize_cfg (int mode) && try_crossjump_bb (mode, EXIT_BLOCK_PTR)) changed = true; + if (block_was_dirty) + df_analyze (); + #ifdef ENABLE_CHECKING if (changed) verify_flow_info (); that shouldn't be necessary.
On 07/27/2010 09:44 AM, Eric Botcazou wrote: >> How is this simpler and better than just having a single line calling >> reorder_insns? It seems pointless given that it produces the same >> result, with a lot more effort. > > It's the canonical way of doing this kind of transformations these days. The > underlying machinery is supposed to do all the heavy lifting, you just have > to drive it. That's a non-argument, and false IMO. Not every optimization can be represented cleanly as a set of CFG manipulations, and this one can't. Are you saying we should restrict ourselves to not doing it? > In particular, I want to avoid kludges like: > > +/* Set to true if we couldn't run an optimization due to stale liveness > + information; we should run df_analyze to enable more opportunities. */ > +static bool block_was_dirty; > > @@ -2182,6 +2449,9 @@ try_optimize_cfg (int mode) > && try_crossjump_bb (mode, EXIT_BLOCK_PTR)) > changed = true; > > + if (block_was_dirty) > + df_analyze (); > + > #ifdef ENABLE_CHECKING > if (changed) > verify_flow_info (); > > that shouldn't be necessary. Still not an argument. Why shouldn't it be necessary? It is logical that by moving code, we change the liveness of registers. We have to verify the liveness of registers before moving code, hence, to iterate, we have to recompute it. Bernd
On 07/27/2010 11:21 AM, Bernd Schmidt wrote: > On 07/27/2010 09:44 AM, Eric Botcazou wrote: >> In particular, I want to avoid kludges like: >> >> +/* Set to true if we couldn't run an optimization due to stale liveness >> + information; we should run df_analyze to enable more opportunities. */ >> +static bool block_was_dirty; >> >> @@ -2182,6 +2449,9 @@ try_optimize_cfg (int mode) >> && try_crossjump_bb (mode, EXIT_BLOCK_PTR)) >> changed = true; >> >> + if (block_was_dirty) >> + df_analyze (); >> + >> #ifdef ENABLE_CHECKING >> if (changed) >> verify_flow_info (); >> >> that shouldn't be necessary. > > Still not an argument. Why shouldn't it be necessary? It is logical > that by moving code, we change the liveness of registers. We have to > verify the liveness of registers before moving code, hence, to iterate, > we have to recompute it. BTW, this is essentially the same thing that's done in the main loop in ifcvt.c (do you also see it as a kludge there?), where I originally implemented this optimization, and you were the one who suggested I move it to cfgcleanup.c. Maybe you misunderstood the optimization back then and thought it was just CFG manipulation? That's simply not the case; the analogy with crossjumping doesn't entirely hold. Please explain what you are thinking. If you have a clever way to do it, show it. If you have just not thought it through sufficiently, please do not continue to hold up a useful improvement. Bernd
On 07/23/10 16:05, Eric Botcazou wrote: >> Before you do that, here's a new version. This corrects a few errors in >> the register lifetime handling, and adds support for moving across two >> basic blocks, which is very useful for switch statements but happens in >> other cases as well. > This implementation really moves insns whereas cross-jumping, the reversed > transformation, is implemented by means of operations on the CFG. Although > this is probably not as straightforward in this direction, did you consider > the CFG approach instead? Wouldn't it simplify a little the integration in > the cfgcleanup.c framework? It's probably worth noting that these optimizations are more effective when they're allowed to move insns. So while limiting to CFG approaches may simplify things, it also leads to fewer opportunities to commonize code. jeff
On 07/27/10 03:21, Bernd Schmidt wrote: > In particular, I want to avoid kludges like: >> +/* Set to true if we couldn't run an optimization due to stale liveness >> + information; we should run df_analyze to enable more opportunities. */ >> +static bool block_was_dirty; >> >> @@ -2182,6 +2449,9 @@ try_optimize_cfg (int mode) >> && try_crossjump_bb (mode, EXIT_BLOCK_PTR)) >> changed = true; >> >> + if (block_was_dirty) >> + df_analyze (); >> + >> #ifdef ENABLE_CHECKING >> if (changed) >> verify_flow_info (); >> >> that shouldn't be necessary. > Still not an argument. Why shouldn't it be necessary? It is logical > that by moving code, we change the liveness of registers. We have to > verify the liveness of registers before moving code, hence, to iterate, > we have to recompute it. It seems to me implementing this optimization well requires insn movement which is going to affect register lifetimes. Furthermore, this optimization is sitting inside a while (changed) style loop. At the least we need to mark blocks where the life data has become inaccurate so that we don't mis-optimize based on inaccurate life data. I haven't thought deeply about the problem, but it may well be the case that as the cfgcleanup loop iterates new opportunities may be exposed and thus it'd be useful to go ahead and update the life information. What I'm more concerned about is placement of this optimization in cfgcleanup -- one could argue this optimization isn't strictly a cfg cleanup given the need to move insns from one block to another (contrast to our cross jumping implementation which just scrambles the cfg). One could formulate a head merging algorithm which worked solely on the CFG, but I doubt it's going to be very effective. jeff
Thanks for looking at this, Jeff. On 07/27/2010 07:09 PM, Jeff Law wrote: > It seems to me implementing this optimization well requires insn > movement which is going to affect register lifetimes. Furthermore, > this optimization is sitting inside a while (changed) style loop. At > the least we need to mark blocks where the life data has become > inaccurate so that we don't mis-optimize based on inaccurate life data. > I haven't thought deeply about the problem, but it may well be the case > that as the cfgcleanup loop iterates new opportunities may be exposed > and thus it'd be useful to go ahead and update the life information. I'm fairly certain I've observed this to happen. Note that other transformations done in cfgcleanup can make life information inaccurate before we even try to run head merging, making it impossible to do the analysis. > What I'm more concerned about is placement of this optimization in > cfgcleanup -- one could argue this optimization isn't strictly a cfg > cleanup given the need to move insns from one block to another (contrast > to our cross jumping implementation which just scrambles the cfg). Originally I'd placed this in ifcvt.c, by analogy with find_if_case_1 and find_if_case_2, which do some very similar transformations. Eric requested I move it to cfgcleanup.c and now seems unhappy about the consequences. I can see two possible reasons for the request: handling switch statements as well as ifs (also possible in ifcvt by placing it right at the top of find_if_header or doing it before calling find_if_header), and the possibility that other cfg cleanups expose more opportunities. I think in theory, it is probably more powerful to do it in cfg cleanup, which is why I did not object to the request to do it there. However, I did not see anything wrong in principle with the original patch that did it in ifcvt (and I think it could have been applied at the time, and would have improved gcc). I wouldn't necessarily mind putting it back into ifcvt, but we might need to insert calls to cleanup_cfg to get the full benefit (possibly restricted to situations where we made a block empty except for a jump). > One could formulate a head merging algorithm which worked solely on the > CFG, but I doubt it's going to be very effective. Well, I don't know what if anything Eric has in mind, but assuming we have BB1 lots of stuff if (x) goto A; BB2 y = 1; goto C; BB3 A: y = 1; goto D; how can we possibly avoid code movement? The whole purpose is that we want to have only one copy of the assignment, and that only works if it's before the jump. Never mind that we couldn't merge it into the end of BB1 since the jump can't be in the middle of a basic block. So it seems fairly obvious to me that any kind of simple CFG manipulation fails entirely to achieve the right result. Even if it can be thought of as "reverse" cross jumping, in terms of implementation the transformations in ifcvt are a somewhat better match than the crossjump code. For one thing, we need almost exactly the same code to check liveness information. I think we can also discard the suggestion of simulating the effect of a single reorder_insns call with a series of complex CFG transformations, as that seems entirely pointless. Bernd
> It's probably worth noting that these optimizations are more effective > when they're allowed to move insns. So while limiting to CFG > approaches may simplify things, it also leads to fewer opportunities to > commonize code. Do you have a concrete example of such an optimization that would be doable with code movements but not with CFG manipulations?
> That's a non-argument, and false IMO. Not every optimization can be > represented cleanly as a set of CFG manipulations, and this one can't. I wrote "this kind of transformations", not "all transformations". What about the algorithm sketched by Paolo? > Are you saying we should restrict ourselves to not doing it? I'm saying that optimizations run in cfgcleanup.c must play by the rules. > Still not an argument. Why shouldn't it be necessary? It is logical > that by moving code, we change the liveness of registers. We have to > verify the liveness of registers before moving code, hence, to iterate, > we have to recompute it. Because this will be done automatically if you use the appropriate API.
> BTW, this is essentially the same thing that's done in the main loop in > ifcvt.c (do you also see it as a kludge there?), where I originally > implemented this optimization, and you were the one who suggested I move > it to cfgcleanup.c. Maybe you misunderstood the optimization back then > and thought it was just CFG manipulation? That's simply not the case; > the analogy with crossjumping doesn't entirely hold. Yes, I still think that it will be more useful in cfgcleanup.c. And it's another form of code commonization so I still think that implementing it using CFG manipulations is the best approach. > Please explain what you are thinking. If you have a clever way to do > it, show it. If you have just not thought it through sufficiently, > please do not continue to hold up a useful improvement. Fair enough. Since I don't have enough time at the moment to experiment myself, I guess I have to withdraw my objections. Please run this by another maintainer though.
> Well, I don't know what if anything Eric has in mind, but assuming we have > > BB1 > lots of stuff > if (x) goto A; > BB2 > y = 1; > goto C; > BB3 > A: y = 1; > goto D; > > how can we possibly avoid code movement? Split BB2 and BB3 after "y = 1;" and redirect the edges from BB1. Then split BB1 before the test and insert one instance of the common heads.
On 07/28/2010 12:18 AM, Eric Botcazou wrote: >> That's a non-argument, and false IMO. Not every optimization can be >> represented cleanly as a set of CFG manipulations, and this one can't. > > I wrote "this kind of transformations", not "all transformations". > > What about the algorithm sketched by Paolo? Paolo's "algorithm" was - split the destination BB before the jump (into BB11 and BB12) - split the source BBs after the last moved instruction (into BB21 and BB22, BB31 and BB32, etc.) - redirect the jumps to BBn1 (n>=2) to go to BBn2. - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 which has exactly the effect of one call to reorder_insns and a few more to delete_insn, except it creates lots of garbage BBs only to delete them immediately again. What exactly is gained by that? Certainly not readability. You're moving insns, so reorder_insns is the correct API. The suggestion is obviously absurd in my eyes. >> Are you saying we should restrict ourselves to not doing it? > > I'm saying that optimizations run in cfgcleanup.c must play by the rules. If your "rules" lead to an absurd result, the rules are bogus. Who decided those "rules" anyway? >> Still not an argument. Why shouldn't it be necessary? It is logical >> that by moving code, we change the liveness of registers. We have to >> verify the liveness of registers before moving code, hence, to iterate, >> we have to recompute it. > > Because this will be done automatically if you use the appropriate API. Then I don't think we have the "appropriate API". The CFG contortions mentioned above certainly do nothing to solve this problem. I still think you're simply missing something here. > Fair enough. Since I don't have enough time at the moment to experiment > myself, I guess I have to withdraw my objections. > > Please run this by another maintainer though. Thanks. Bernd
On Wed, Jul 28, 2010 at 00:18, Eric Botcazou <ebotcazou@adacore.com> wrote: >> That's a non-argument, and false IMO. Not every optimization can be >> represented cleanly as a set of CFG manipulations, and this one can't. > > I wrote "this kind of transformations", not "all transformations". > > What about the algorithm sketched by Paolo? That's what Bernd referred to when he said, "I think we can also discard the suggestion of simulating the effect of a single reorder_insns call with a series of complex CFG transformations, as that seems entirely pointless." I actually agree with him. I don't think it is _that_ complex (particularly because my sketch did more than a single reorder_insns), but I agree it is pointless. It is faking that head merging is a pure CFG transformation when in fact it isn't. Paolo
> What exactly is gained by that? Certainly not readability. Avoiding kludges like the one I already mentioned. > You're moving insns, so reorder_insns is the correct API. The suggestion is > obviously absurd in my eyes. The first sentence is equally absurd these days because of CFG layout mode.
On 07/28/2010 10:35 AM, Eric Botcazou wrote: >> What exactly is gained by that? Certainly not readability. > > Avoiding kludges like the one I already mentioned. As I already pointed out, it's a) not a kludge, but quite necessary and done like that elsewhere b) not avoided by pretending a reorder_insns operation is a CFG operation. Hence, your comment makes no sense. You've never replied with anything of substance to either point a) or point b). >> You're moving insns, so reorder_insns is the correct API. The suggestion is >> obviously absurd in my eyes. > > The first sentence is equally absurd these days because of CFG layout mode. Explain how that is relevant to the current discussion. I believe it's completely beside the point, as that's only concerned with the layout of basic blocks, not with the placement of insns within them. Moving insns from one basic block to another (while explicitly avoiding touching things like final jump insns) doesn't affect that, so IMO you're still not making any sense. Maybe that's the thing you're missing? No control flow insns are ever touched or moved at all by the patch. The CFG is in every case the same afterwards as it was before (although it may be cleaned up, but that's a different job done already by the other code in cfglcleanup). That's a pretty strong hint that we're not dealing with a CFG operation. Your attempts at patch review for this issue all have been drive-by one-liners like this, which were at best insufficiently explained, and at worst completely nonsensical. It has been extremely frustrating. Bernd
On 07/27/10 16:28, Eric Botcazou wrote: >> BTW, this is essentially the same thing that's done in the main loop in >> ifcvt.c (do you also see it as a kludge there?), where I originally >> implemented this optimization, and you were the one who suggested I move >> it to cfgcleanup.c. Maybe you misunderstood the optimization back then >> and thought it was just CFG manipulation? That's simply not the case; >> the analogy with crossjumping doesn't entirely hold. > Yes, I still think that it will be more useful in cfgcleanup.c. While I don't think head merging is a perfect fit for cfgcleanup.c, I don't think it's worth a huge argument. I'll go along with the final version (whatever it looks like) in cfgcleanup.c > And it's > another form of code commonization so I still think that implementing it using > CFG manipulations is the best approach. I think this is the root of the disagreement. I'll keep looking at it. > Please run this by another maintainer though. I'll own reviewing. Jeff
On 07/27/10 16:38, Eric Botcazou wrote: >> Well, I don't know what if anything Eric has in mind, but assuming we have >> >> BB1 >> lots of stuff >> if (x) goto A; >> BB2 >> y = 1; >> goto C; >> BB3 >> A: y = 1; >> goto D; >> >> how can we possibly avoid code movement? > Split BB2 and BB3 after "y = 1;" and redirect the edges from BB1. Then split > BB1 before the test and insert one instance of the common heads. Which to me seems more convoluted than just moving insns implementing the common code. And I think that's the whole point behind the disagreement. While we *can* formulate this as a series of CFG manipulations I think it actually makes the resulting transformation more difficult to understand. Also note that trying to pluck common insns that aren't at the head of the blocks is more difficult to do as a pure CFG transformation (it's doable, just ugly) while it ought to be trivial to just move them around. Jeff
On 07/27/10 16:16, Eric Botcazou wrote: >> It's probably worth noting that these optimizations are more effective >> when they're allowed to move insns. So while limiting to CFG >> approaches may simplify things, it also leads to fewer opportunities to >> commonize code. > Do you have a concrete example of such an optimization that would be doable > with code movements but not with CFG manipulations? Think about plucking a common insn from the middle of a block rather than strictly at the head or tail. Jeff
On 07/28/2010 07:05 PM, Jeff Law wrote: > On 07/27/10 16:16, Eric Botcazou wrote: >>> It's probably worth noting that these optimizations are more >>> effective when they're allowed to move insns. So while limiting >>> to CFG approaches may simplify things, it also leads to fewer >>> opportunities to commonize code. >> Do you have a concrete example of such an optimization that would >> be doable with code movements but not with CFG manipulations? > Think about plucking a common insn from the middle of a block rather > than strictly at the head or tail. In a sense that's what happens here if you consider the middle as anything between the code_label or note_insn_basic_block and the final jump. Of course, moving out of the middle can also be disguised with a CFG scheme like the one Paolo described (you just need to split a block twice), but the real question is, why would anyone want to do that? Just to follow arbitrary, misunderstood rules? > Which to me seems more convoluted than just moving insns implementing > the common code. And I think that's the whole point behind the > disagreement. There must be something else, Eric seems to think using CFG manipulation would somehow eliminate the need to verify register lifetimes. If you agree with me that this is simply impossible, I think we can bury that part of the issue. Bernd
On 07/27/10 16:45, Bernd Schmidt wrote: > On 07/28/2010 12:18 AM, Eric Botcazou wrote: >>> That's a non-argument, and false IMO. Not every optimization can be >>> represented cleanly as a set of CFG manipulations, and this one can't. >> I wrote "this kind of transformations", not "all transformations". >> >> What about the algorithm sketched by Paolo? > Paolo's "algorithm" was > - split the destination BB before the jump (into BB11 and BB12) > - split the source BBs after the last moved instruction (into BB21 and > BB22, BB31 and BB32, etc.) > - redirect the jumps to BBn1 (n>=2) to go to BBn2. > - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 > > which has exactly the effect of one call to reorder_insns and a few more > to delete_insn, except it creates lots of garbage BBs only to delete > them immediately again. > > What exactly is gained by that? Certainly not readability. You're > moving insns, so reorder_insns is the correct API. The suggestion is > obviously absurd in my eyes. Can we all agree that the problem can be viewed as either cfg manipulations or insn movement and that what we're really arguing about is which is the most appropriate way to view the problem? If so, then we really need to determine which implementation is the easiest to understand, implement & maintain. >> I'm saying that optimizations run in cfgcleanup.c must play by the rules. > If your "rules" lead to an absurd result, the rules are bogus. Who > decided those "rules" anyway? I'm not aware of an such rule. I can see the value in placing such rules on cfgcleanup.c's worker bees which is part of the reason why I originally suggested this optimization (if implemented as insn movement) be placed somewhere other than cfgcleanup. Jeff
On 07/28/2010 08:25 PM, Jeff Law wrote: >>> I'm saying that optimizations run in cfgcleanup.c must play by the >>> rules. >> If your "rules" lead to an absurd result, the rules are bogus. Who >> decided those "rules" anyway? > > I'm not aware of an such rule. I can see the value in placing such rules > on cfgcleanup.c's worker bees which is part of the reason why I > originally suggested this optimization (if implemented as insn movement) > be placed somewhere other than cfgcleanup. Personally I don't care about how the pass is implemented, I think it fits more in cfgcleanup.c anyway than in if-conversion (because it doesn't remove the conditional execution). There may be another advantage in putting it in cfgcleanup; using flags to control head-merging may be more suitable than the relatively rigid pass manager. Paolo
On 07/28/10 04:04, Bernd Schmidt wrote: > > As I already pointed out, it's > a) not a kludge, but quite necessary and done like that elsewhere > b) not avoided by pretending a reorder_insns operation is a CFG operation. > > Hence, your comment makes no sense. You've never replied with anything > of substance to either point a) or point b). Presumably when we split the blocks we trigger an DF update, either immediate or deferred. At least that's what makes sense to me (since in the CFG formulation there isn't any real insn movement, just block splitting and edge redirection). While I see value in having the DF update happen automagically as a result of our CFG manipulations, I'm still not seeing how CFG manipulations are a cleaner way to express this optimization. > > Maybe that's the thing you're missing? No control flow insns are ever > touched or moved at all by the patch. The CFG is in every case the same > afterwards as it was before (although it may be cleaned up, but that's a > different job done already by the other code in cfglcleanup). That's a > pretty strong hint that we're not dealing with a CFG operation. I doubt that's what Eric is missing -- there's really two ways to formulate this optimization, moving insns without manipulating the CFG and strictly with CFG manipulations. It appears to me y'all differ on which of the implementations is preferred. > Your attempts at patch review for this issue all have been drive-by > one-liners like this, which were at best insufficiently explained, and > at worst completely nonsensical. It has been extremely frustrating. I can see it's frustrating for both of you -- I'd ask that everyone remember that you both are advocating a solution you believe in. It's not personal, but just a matter of different technical opinions. jeff
On 07/28/2010 09:39 PM, Jeff Law wrote: > On 07/28/10 04:04, Bernd Schmidt wrote: >> >> As I already pointed out, it's >> a) not a kludge, but quite necessary and done like that elsewhere >> b) not avoided by pretending a reorder_insns operation is a CFG >> operation. >> >> Hence, your comment makes no sense. You've never replied with anything >> of substance to either point a) or point b). > Presumably when we split the blocks we trigger an DF update, either > immediate or deferred. All I can see in cfgcleanup or cfgrtl are manual calls to df_set_bb_dirty to show we don't know anything about register lives in modified blocks. This is also done by my new pass if it modifies stuff. If we want to get accurate lifetime data (and we need it to verify we can move insns), we'll then have to call df_analyze at some point as far as I know. We certainly don't want to do that for every change we make, so it has to happen at the top level when every other possibility has been exhausted. So I don't see how we'd avoid the "kludge". Again, see ifcvt.c, it's precisely the same structure. The df_analyze call is new simply because no code in cfgcleanup.c currently needs to look at register lifetimes. Bernd
On 07/26/2010 03:40 PM, Paolo Bonzini wrote: > - split the destination BB before the jump (into BB11 and BB12) > - split the source BBs after the last moved instruction (into BB21 and > BB22, BB31 and BB32, etc.) > - redirect the jumps to BBn1 (n>=2) to go to BBn2. > - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 The funny thing is that when you look at merge_blocks_move and its subroutines, which presumably would be used in the last step, you'll find it's implemented using... reorder_insns!. So, the above creates several useless BB structures, moves them about, deletes them again, performs the roughly same number of reorder_insns and delete_insn calls as the code it would be replacing, only to end up with the same CFG as when it started. All in order to pretend we're doing a CFG operation. Bernd
> I think this is the root of the disagreement. I'll keep looking at it. > > > Please run this by another maintainer though. > > I'll own reviewing. Thanks.
> Can we all agree that the problem can be viewed as either cfg > manipulations or insn movement and that what we're really arguing about > is which is the most appropriate way to view the problem? Yes. > If so, then we really need to determine which implementation is the > easiest to understand, implement & maintain. Yes, and I think only experiments can give a definitive answer. That's all what I requested from Bernd.
On 07/28/10 15:30, Bernd Schmidt wrote: > On 07/26/2010 03:40 PM, Paolo Bonzini wrote: >> - split the destination BB before the jump (into BB11 and BB12) >> - split the source BBs after the last moved instruction (into BB21 and >> BB22, BB31 and BB32, etc.) >> - redirect the jumps to BBn1 (n>=2) to go to BBn2. >> - graft BB21 between BB11 and BB12, remove all BBn1 for n>2 > The funny thing is that when you look at merge_blocks_move and its > subroutines, which presumably would be used in the last step, you'll > find it's implemented using... reorder_insns!. Quite amusing. I guess that eliminates the need to ponder an implementation of reorder_insns as a CFG manipulation and how that would affect readability of the resulting code -- I wasn't going to suggest we actually make that change only ponder the impacts as a proxy for the main issue we're stuck on. Jeff
On 07/28/10 13:54, Bernd Schmidt wrote: > On 07/28/2010 09:39 PM, Jeff Law wrote: >> On 07/28/10 04:04, Bernd Schmidt wrote: >>> As I already pointed out, it's >>> a) not a kludge, but quite necessary and done like that elsewhere >>> b) not avoided by pretending a reorder_insns operation is a CFG >>> operation. >>> >>> Hence, your comment makes no sense. You've never replied with anything >>> of substance to either point a) or point b). >> Presumably when we split the blocks we trigger an DF update, either >> immediate or deferred. > All I can see in cfgcleanup or cfgrtl are manual calls to > df_set_bb_dirty to show we don't know anything about register lives in > modified blocks. This is also done by my new pass if it modifies stuff. So we've got the markers to allow us to do a deferred update, but with nothing needing the accurate DF info in the cfgcleanup loop, we just ignore the markers (within the context of that loop). Which certainly makes sense if nothing in cfgcleanup has needed accurate register lifetime until now. I'm having an awful hard time seeing what modeling this optimization as a series of CFG manipulations is going to win us right now. It's more complex than using reorder_insns, it doesn't keep the DF information up-to-date, and generates unnecessary churn in the CFG and other data structures. Jeff
On 07/29/2010 05:26 PM, Jeff Law wrote: >>> >> All I can see in cfgcleanup or cfgrtl are manual calls to >> df_set_bb_dirty to show we don't know anything about register lives in >> modified blocks. This is also done by my new pass if it modifies stuff. > > So we've got the markers to allow us to do a deferred update, but with > nothing needing the accurate DF info in the cfgcleanup loop, we just > ignore the markers (within the context of that loop). Which certainly > makes sense if nothing in cfgcleanup has needed accurate register > lifetime until now. Correct. _Operand scan_ can be made automatic, but not dataflow analysis. > I'm having an awful hard time seeing what modeling this optimization as > a series of CFG manipulations is going to win us right now. It's more > complex than using reorder_insns, it doesn't keep the DF information > up-to-date, and generates unnecessary churn in the CFG and other data > structures. Just to state it once more, I agree. I have two remaining doubts, which were shadowed by the discussion so far. Don't worry, it's small stuff. :) I'd like to have a note to the reader that df_analyze is only invoked when you do crossjumping. Please add an assert like if (block_was_dirty) { gcc_assert (mode & CLEANUP_CROSSJUMP); df_analyze (); } We do not use dataflow otherwise, and it is not necessary to call it gratuitously. Passes know that CFG cleanup destroys dataflow and call it themselves if necessary. Second, crossjumping is now more expensive. Does it buy much really to iterate it? Something like mode &= ~CLEANUP_CROSSJUMP; just before iterating may still leave it "good enough". Steven, do you remember anything? This anyway can be done separately after the patch goes in. Paolo
On 07/29/2010 06:19 PM, Paolo Bonzini wrote: > I'd like to have a note to the reader that df_analyze is only invoked > when you do crossjumping. Please add an assert like > > if (block_was_dirty) > { > gcc_assert (mode & CLEANUP_CROSSJUMP); > df_analyze (); > } Can do. > We do not use dataflow otherwise, and it is not necessary to call it > gratuitously. Passes know that CFG cleanup destroys dataflow and call > it themselves if necessary. Then again, we probably won't lose much by calling df_analyze during cfgcleanup if the following pass needs it anyway - right? > Second, crossjumping is now more expensive. Does it buy much really to > iterate it? Something like > > mode &= ~CLEANUP_CROSSJUMP; > > just before iterating may still leave it "good enough". A quick experiment shows that this causes many missed opportunities. (Placed it after the run_fast_dce call). Another issue that I stumbled across is that cfgcleanup uses the df_bb_dirty flag for other reasons: it uses it to test whether a block has changed previously, and retries its optimizations only if that is the case. We probably need a different flag to indicate "block changed during cfgcleanup" and set it from df_bb_dirty just before calling df_analyze. Bernd
On 07/29/2010 07:00 PM, Bernd Schmidt wrote: > On 07/29/2010 06:19 PM, Paolo Bonzini wrote: >> I'd like to have a note to the reader that df_analyze is only invoked >> when you do crossjumping. Please add an assert like >> >> if (block_was_dirty) >> { >> gcc_assert (mode & CLEANUP_CROSSJUMP); >> df_analyze (); >> } > > Can do. > >> We do not use dataflow otherwise, and it is not necessary to call it >> gratuitously. Passes know that CFG cleanup destroys dataflow and call >> it themselves if necessary. > > Then again, we probably won't lose much by calling df_analyze during > cfgcleanup if the following pass needs it anyway - right? What I meant is I want to document that it's for a special case. I wouldn't like someone to randomly remove the if just because it happens to fix his bug. Certainly I didn't want to imply any further change. :-) >> Second, crossjumping is now more expensive. Does it buy much really to >> iterate it? Something like >> >> mode &= ~CLEANUP_CROSSJUMP; >> >> just before iterating may still leave it "good enough". > > A quick experiment shows that this causes many missed opportunities. > (Placed it after the run_fast_dce call). Thanks. Wishful thinking. Paolo
On 07/27/10 15:35, Bernd Schmidt wrote: > Thanks for looking at this, Jeff. > > On 07/27/2010 07:09 PM, Jeff Law wrote: >> It seems to me implementing this optimization well requires insn >> movement which is going to affect register lifetimes. Furthermore, >> this optimization is sitting inside a while (changed) style loop. At >> the least we need to mark blocks where the life data has become >> inaccurate so that we don't mis-optimize based on inaccurate life data. >> I haven't thought deeply about the problem, but it may well be the case >> that as the cfgcleanup loop iterates new opportunities may be exposed >> and thus it'd be useful to go ahead and update the life information. > I'm fairly certain I've observed this to happen. Note that other > transformations done in cfgcleanup can make life information inaccurate > before we even try to run head merging, making it impossible to do the > analysis. If that occurs, we will have set block_was_dirty and changed which tells head merging to do nothing. At the bottom of the loop we call df_analyze to get things up-to-date, then start the next iteration which then allows head merging its chance at any blocks which were dirty on the previous iteration, right? Presumably there's no blocks marked as dirty when cfgcleanup starts :-) Jeff
On 07/29/2010 07:23 PM, Jeff Law wrote: > On 07/27/10 15:35, Bernd Schmidt wrote: >> Thanks for looking at this, Jeff. >> >> On 07/27/2010 07:09 PM, Jeff Law wrote: >>> It seems to me implementing this optimization well requires insn >>> movement which is going to affect register lifetimes. Furthermore, >>> this optimization is sitting inside a while (changed) style loop. At >>> the least we need to mark blocks where the life data has become >>> inaccurate so that we don't mis-optimize based on inaccurate life data. >>> I haven't thought deeply about the problem, but it may well be the case >>> that as the cfgcleanup loop iterates new opportunities may be exposed >>> and thus it'd be useful to go ahead and update the life information. >> I'm fairly certain I've observed this to happen. Note that other >> transformations done in cfgcleanup can make life information inaccurate >> before we even try to run head merging, making it impossible to do the >> analysis. > If that occurs, we will have set block_was_dirty and changed which tells > head merging to do nothing. At the bottom of the loop we call > df_analyze to get things up-to-date, then start the next iteration which > then allows head merging its chance at any blocks which were dirty on > the previous iteration, right? That's the plan. > Presumably there's no blocks marked as dirty when cfgcleanup starts :-) I guess there might be if the previous pass modifies things that affect liveness. The effect would be to postpone head-merging until the second pass of try_cleanup_cfg. Also, as I said other cfgcleanup transformations can mark blocks dirty. If there's a lot going on we'll just have to iterate until nothing can be improved anymore. Bernd
On Thu, Jul 29, 2010 at 6:19 PM, Paolo Bonzini <bonzini@gnu.org> wrote: > Second, crossjumping is now more expensive. Does it buy much really to > iterate it? Something like > > mode &= ~CLEANUP_CROSSJUMP; > > just before iterating may still leave it "good enough". Steven, do you > remember anything? This anyway can be done separately after the patch goes > in. Iterating is often helpful. Crossjumping only merges single pairs of basic blocks per iteration, but never across a control flow statement. If you iterate, you usually find that the previous iteration exposed further opportunities. And crossjumping is not very expensive anyway. <plug> I just hopes someone picks up the patches of PR20070 for pre-reload crossjumping, that's even more helpful than iterating. </plug> Ciao! Steven
Index: ifcvt.c =================================================================== --- ifcvt.c (revision 162372) +++ ifcvt.c (working copy) @@ -101,7 +101,6 @@ static int noce_find_if_block (basic_blo static int cond_exec_find_if_block (ce_if_block_t *); static int find_if_case_1 (basic_block, edge, edge); static int find_if_case_2 (basic_block, edge, edge); -static int find_memory (rtx *, void *); static int dead_or_predicable (basic_block, basic_block, basic_block, basic_block, int); static void noce_emit_move_insn (rtx, rtx); @@ -3877,15 +3876,6 @@ find_if_case_2 (basic_block test_bb, edg return TRUE; } -/* A subroutine of dead_or_predicable called through for_each_rtx. - Return 1 if a memory is found. */ - -static int -find_memory (rtx *px, void *data ATTRIBUTE_UNUSED) -{ - return MEM_P (*px); -} - /* Used by the code above to perform the actual rtl transformations. Return TRUE if successful. @@ -3987,131 +3977,38 @@ dead_or_predicable (basic_block test_bb, earliest = jump; } #endif + /* If we allocated new pseudos (e.g. in the conditional move + expander called from noce_emit_cmove), we must resize the + array first. */ + if (max_regno < max_reg_num ()) + max_regno = max_reg_num (); + /* Try the NCE path if the CE path did not result in any changes. */ if (n_validated_changes == 0) { + rtx cond; + regset live; + bool success; + /* In the non-conditional execution case, we have to verify that there are no trapping operations, no calls, no references to memory, and that any registers modified are dead at the branch site. */ - rtx insn, cond, prev; - bitmap merge_set, merge_set_noclobber, test_live, test_set; - unsigned i, fail = 0; - bitmap_iterator bi; - - /* Check for no calls or trapping operations. */ - for (insn = head; ; insn = NEXT_INSN (insn)) - { - if (CALL_P (insn)) - return FALSE; - if (NONDEBUG_INSN_P (insn)) - { - if (may_trap_p (PATTERN (insn))) - return FALSE; - - /* ??? Even non-trapping memories such as stack frame - references must be avoided. For stores, we collect - no lifetime info; for reads, we'd have to assert - true_dependence false against every store in the - TEST range. */ - if (for_each_rtx (&PATTERN (insn), find_memory, NULL)) - return FALSE; - } - if (insn == end) - break; - } - - if (! any_condjump_p (jump)) + if (!any_condjump_p (jump)) return FALSE; /* Find the extent of the conditional. */ cond = noce_get_condition (jump, &earliest, false); - if (! cond) + if (!cond) return FALSE; - /* Collect: - MERGE_SET = set of registers set in MERGE_BB - MERGE_SET_NOCLOBBER = like MERGE_SET, but only includes registers - that are really set, not just clobbered. - TEST_LIVE = set of registers live at EARLIEST - TEST_SET = set of registers set between EARLIEST and the - end of the block. */ - - merge_set = BITMAP_ALLOC (®_obstack); - merge_set_noclobber = BITMAP_ALLOC (®_obstack); - test_live = BITMAP_ALLOC (®_obstack); - test_set = BITMAP_ALLOC (®_obstack); - - /* ??? bb->local_set is only valid during calculate_global_regs_live, - so we must recompute usage for MERGE_BB. Not so bad, I suppose, - since we've already asserted that MERGE_BB is small. */ - /* If we allocated new pseudos (e.g. in the conditional move - expander called from noce_emit_cmove), we must resize the - array first. */ - if (max_regno < max_reg_num ()) - max_regno = max_reg_num (); - - FOR_BB_INSNS (merge_bb, insn) - { - if (NONDEBUG_INSN_P (insn)) - { - df_simulate_find_defs (insn, merge_set); - df_simulate_find_noclobber_defs (insn, merge_set_noclobber); - } - } - - /* For small register class machines, don't lengthen lifetimes of - hard registers before reload. */ - if (! reload_completed - && targetm.small_register_classes_for_mode_p (VOIDmode)) - { - EXECUTE_IF_SET_IN_BITMAP (merge_set_noclobber, 0, i, bi) - { - if (i < FIRST_PSEUDO_REGISTER - && ! fixed_regs[i] - && ! global_regs[i]) - fail = 1; - } - } - - /* For TEST, we're interested in a range of insns, not a whole block. - Moreover, we're interested in the insns live from OTHER_BB. */ - - /* The loop below takes the set of live registers - after JUMP, and calculates the live set before EARLIEST. */ - bitmap_copy (test_live, df_get_live_in (other_bb)); - df_simulate_initialize_backwards (test_bb, test_live); - for (insn = jump; ; insn = prev) - { - if (INSN_P (insn)) - { - df_simulate_find_defs (insn, test_set); - df_simulate_one_insn_backwards (test_bb, insn, test_live); - } - prev = PREV_INSN (insn); - if (insn == earliest) - break; - } - - /* We can perform the transformation if - MERGE_SET_NOCLOBBER & TEST_SET - and - MERGE_SET & TEST_LIVE) - and - TEST_SET & DF_LIVE_IN (merge_bb) - are empty. */ - - if (bitmap_intersect_p (test_set, merge_set_noclobber) - || bitmap_intersect_p (test_live, merge_set) - || bitmap_intersect_p (test_set, df_get_live_in (merge_bb))) - fail = 1; - - BITMAP_FREE (merge_set_noclobber); - BITMAP_FREE (merge_set); - BITMAP_FREE (test_live); - BITMAP_FREE (test_set); - - if (fail) + live = BITMAP_ALLOC (®_obstack); + simulate_backwards_to_point (merge_bb, live, end); + success = can_move_insns_across (head, end, earliest, jump, + merge_bb, live, + df_get_live_in (other_bb), NULL); + BITMAP_FREE (live); + if (!success) return FALSE; } Index: df.h =================================================================== --- df.h (revision 162372) +++ df.h (working copy) @@ -992,7 +992,9 @@ extern void df_simulate_one_insn_backwar extern void df_simulate_finalize_backwards (basic_block, bitmap); extern void df_simulate_initialize_forwards (basic_block, bitmap); extern void df_simulate_one_insn_forwards (basic_block, rtx, bitmap); - +extern void simulate_backwards_to_point (basic_block, regset, rtx); +extern bool can_move_insns_across (rtx, rtx, rtx, rtx, basic_block, regset, + regset, rtx *); /* Functions defined in df-scan.c. */ extern void df_scan_alloc (bitmap); Index: cfgcleanup.c =================================================================== --- cfgcleanup.c (revision 162372) +++ cfgcleanup.c (working copy) @@ -66,6 +66,10 @@ static bool first_pass; /* Set to true if crossjumps occured in the latest run of try_optimize_cfg. */ static bool crossjumps_occured; +/* Set to true if we couldn't run an optimization due to stale liveness + information; we should run df_analyze to enable more opportunities. */ +static bool block_was_dirty; + static bool try_crossjump_to_edge (int, edge, edge); static bool try_crossjump_bb (int, basic_block); static bool outgoing_edges_match (int, basic_block, basic_block); @@ -1927,6 +1931,261 @@ try_crossjump_bb (int mode, basic_block return changed; } +/* Search the successors of BB for common insn sequences. When found, + share code between them by moving it across the basic block + boundary. Return true if any changes made. */ + +static bool +try_head_merge_bb (basic_block bb) +{ + basic_block final_dest_bb = NULL; + int max_match = INT_MAX; + edge e0; + rtx *headptr, *currptr; + bool changed, moveall; + unsigned ix; + rtx e0_last_head, cond, move_before; + unsigned nedges = EDGE_COUNT (bb->succs); + rtx jump = BB_END (bb); + regset live, live_union; + + /* Nothing to do if there is not at least two outgoing edges. */ + if (nedges < 2) + return false; + + /* Don't crossjump if this block ends in a computed jump, + unless we are optimizing for size. */ + if (optimize_bb_for_size_p (bb) + && bb != EXIT_BLOCK_PTR + && computed_jump_p (BB_END (bb))) + return false; + + cond = get_condition (jump, &move_before, true, false); + if (cond == NULL_RTX) + move_before = jump; + + for (ix = 0; ix < nedges; ix++) + { + edge e = EDGE_SUCC (bb, ix); + basic_block other_bb = e->dest; + + if (df_get_bb_dirty (other_bb)) + { + block_was_dirty = true; + return false; + } + + if (e->flags & EDGE_ABNORMAL) + return false; + + /* Normally, all destination blocks must only be reachable from this + block, i.e. they must have one incoming edge. + + There is one special case we can handle, that of multiple consecutive + jumps where the first jumps to one of the targets of the second jump. + This happens frequently in switch statements for default labels. + The structure is as follows: + FINAL_DEST_BB + .... + if (cond) jump A; + fall through + BB + jump with targets A, B, C, D... + A + has two incoming edges, from FINAL_DEST_BB and BB + + In this case, we can try to move the insns through BB and into + FINAL_DEST_BB. */ + if (EDGE_COUNT (other_bb->preds) != 1) + { + edge incoming_edge, incoming_bb_other_edge; + edge_iterator ei; + + if (final_dest_bb != NULL + || EDGE_COUNT (other_bb->preds) != 2) + return false; + + /* We must be able to move the insns across the whole block. */ + move_before = BB_HEAD (bb); + while (!NONDEBUG_INSN_P (move_before)) + move_before = NEXT_INSN (move_before); + + FOR_EACH_EDGE (incoming_edge, ei, bb->preds) + if (incoming_edge->dest == bb) + break; + final_dest_bb = incoming_edge->src; + if (EDGE_COUNT (final_dest_bb->succs) != 2) + return false; + FOR_EACH_EDGE (incoming_bb_other_edge, ei, final_dest_bb->succs) + if (incoming_bb_other_edge != incoming_edge) + break; + if (incoming_bb_other_edge->dest != other_bb) + return false; + } + } + + e0 = EDGE_SUCC (bb, 0); + e0_last_head = NULL_RTX; + changed = false; + + for (ix = 1; ix < nedges; ix++) + { + edge e = EDGE_SUCC (bb, ix); + rtx e0_last, e_last; + int nmatch; + + nmatch = flow_find_head_matching_sequence (e0->dest, e->dest, + &e0_last, &e_last, 0); + if (nmatch == 0) + return false; + + if (nmatch < max_match) + { + max_match = nmatch; + e0_last_head = e0_last; + } + } + + /* If we matched an entire block, we probably have to avoid moving the + last insn. */ + if (max_match > 0 + && e0_last_head == BB_END (e0->dest) + && (find_reg_note (e0_last_head, REG_EH_REGION, 0) + || control_flow_insn_p (e0_last_head))) + { + max_match--; + if (max_match == 0) + return false; + do + e0_last_head = prev_real_insn (e0_last_head); + while (DEBUG_INSN_P (e0_last_head)); + } + + if (max_match == 0) + return false; + + /* We must find a union of the live registers at each of the end points. */ + live = BITMAP_ALLOC (NULL); + live_union = BITMAP_ALLOC (NULL); + + currptr = XNEWVEC (rtx, nedges); + headptr = XNEWVEC (rtx, nedges); + + for (ix = 0; ix < nedges; ix++) + { + int j; + basic_block merge_bb = EDGE_SUCC (bb, ix)->dest; + rtx head = BB_HEAD (merge_bb); + + while (!NONDEBUG_INSN_P (head)) + head = NEXT_INSN (head); + headptr[ix] = head; + currptr[ix] = head; + + /* Compute the end point and live information */ + for (j = 1; j < max_match; j++) + do + head = NEXT_INSN (head); + while (!NONDEBUG_INSN_P (head)); + simulate_backwards_to_point (merge_bb, live, head); + IOR_REG_SET (live_union, live); + } + + /* If we're moving across two blocks, verify the validity of the + first move, then adjust the target and let the loop below deal + with the final move. */ + if (final_dest_bb != NULL) + { + rtx move_upto; + + moveall = can_move_insns_across (currptr[0], e0_last_head, move_before, + jump, e0->dest, live_union, + NULL, &move_upto); + if (!moveall) + e0_last_head = move_upto; + if (e0_last_head == NULL_RTX) + goto out; + + jump = BB_END (final_dest_bb); + cond = get_condition (jump, &move_before, true, false); + if (cond == NULL_RTX) + move_before = jump; + } + + do + { + rtx move_upto; + moveall = can_move_insns_across (currptr[0], e0_last_head, + move_before, jump, e0->dest, live_union, + NULL, &move_upto); + if (!moveall && move_upto == NULL_RTX) + { + if (jump == move_before) + break; + + /* Try again, using a different insertion point. */ + move_before = jump; + continue; + } + + if (final_dest_bb && !moveall) + /* We haven't checked whether a partial move would be OK for the first + move, so we have to fail this case. */ + break; + + changed = true; + for (;;) + { + if (currptr[0] == move_upto) + break; + for (ix = 0; ix < nedges; ix++) + { + rtx curr = currptr[ix]; + do + curr = NEXT_INSN (curr); + while (!NONDEBUG_INSN_P (curr)); + currptr[ix] = curr; + } + } + + reorder_insns (headptr[0], currptr[0], PREV_INSN (move_before)); + df_set_bb_dirty (EDGE_SUCC (bb, 0)->dest); + if (final_dest_bb != NULL) + df_set_bb_dirty (final_dest_bb); + df_set_bb_dirty (bb); + for (ix = 1; ix < nedges; ix++) + { + df_set_bb_dirty (EDGE_SUCC (bb, ix)->dest); + delete_insn_chain (headptr[ix], currptr[ix], false); + } + if (!moveall) + { + if (jump == move_before) + break; + + /* Try again, using a different insertion point. */ + move_before = jump; + for (ix = 0; ix < nedges; ix++) + { + rtx curr = currptr[ix]; + do + curr = NEXT_INSN (curr); + while (!NONDEBUG_INSN_P (curr)); + currptr[ix] = headptr[ix] = curr; + } + } + } + while (!moveall); + + out: + free (currptr); + free (headptr); + + crossjumps_occured |= changed; + + return changed; +} + /* Return true if BB contains just bb note, or bb note followed by only DEBUG_INSNs. */ @@ -1972,6 +2231,7 @@ try_optimize_cfg (int mode) one predecessor, they may be combined. */ do { + block_was_dirty = false; changed = false; iterations++; @@ -2170,6 +2430,13 @@ try_optimize_cfg (int mode) && try_crossjump_bb (mode, b)) changed_here = true; + if ((mode & CLEANUP_CROSSJUMP) + /* This can lengthen register lifetimes. Do it only after + reload. */ + && reload_completed + && try_head_merge_bb (b)) + changed_here = true; + /* Don't get confused by the index shift caused by deleting blocks. */ if (!changed_here) @@ -2182,6 +2449,9 @@ try_optimize_cfg (int mode) && try_crossjump_bb (mode, EXIT_BLOCK_PTR)) changed = true; + if (block_was_dirty) + df_analyze (); + #ifdef ENABLE_CHECKING if (changed) verify_flow_info (); @@ -2366,8 +2636,7 @@ cleanup_cfg (int mode) if ((mode & CLEANUP_EXPENSIVE) && !reload_completed && !delete_trivially_dead_insns (get_insns (), max_reg_num ())) break; - else if ((mode & CLEANUP_CROSSJUMP) - && crossjumps_occured) + if ((mode & CLEANUP_CROSSJUMP) && crossjumps_occured) run_fast_dce (); } else Index: df-problems.c =================================================================== --- df-problems.c (revision 162372) +++ df-problems.c (working copy) @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "basic-block.h" #include "sbitmap.h" #include "bitmap.h" +#include "target.h" #include "timevar.h" #include "df.h" #include "except.h" @@ -3804,6 +3805,27 @@ df_simulate_find_defs (rtx insn, bitmap } } +/* Find the set of uses for INSN. This includes partial defs. */ + +static void +df_simulate_find_uses (rtx insn, bitmap uses) +{ + df_ref *rec; + unsigned int uid = INSN_UID (insn); + + for (rec = DF_INSN_UID_DEFS (uid); *rec; rec++) + { + df_ref def = *rec; + if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) + bitmap_set_bit (uses, DF_REF_REGNO (def)); + } + for (rec = DF_INSN_UID_USES (uid); *rec; rec++) + { + df_ref use = *rec; + bitmap_set_bit (uses, DF_REF_REGNO (use)); + } +} + /* Find the set of real DEFs, which are not clobbers, for INSN. */ void @@ -4031,7 +4053,297 @@ df_simulate_one_insn_forwards (basic_blo } df_simulate_fixup_sets (bb, live); } + +/* Used by the next two functions to encode information about the + memory references we found. */ +#define MEMREF_NORMAL 1 +#define MEMREF_VOLATILE 2 + +/* A subroutine of can_move_insns_across_p called through for_each_rtx. + Return either MEMREF_NORMAL or MEMREF_VOLATILE if a memory is found. */ + +static int +find_memory (rtx *px, void *data ATTRIBUTE_UNUSED) +{ + rtx x = *px; + if (!MEM_P (x)) + return 0; + if (MEM_VOLATILE_P (x)) + return MEMREF_VOLATILE; + if (MEM_READONLY_P (x)) + return 0; + + return MEMREF_NORMAL; +} + +/* A subroutine of can_move_insns_across_p called through note_stores. + DATA points to an integer in which we set either the bit for + MEMREF_NORMAL or the bit for MEMREF_VOLATILE if we find a MEM + of either kind. */ + +static void +find_memory_stores (rtx x, const_rtx pat ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + int *pflags = (int *)data; + if (GET_CODE (x) == SUBREG) + x = XEXP (x, 0); + /* Treat stores to SP as stores to memory, this will prevent problems + when there are references to the stack frame. */ + if (x == stack_pointer_rtx) + *pflags |= MEMREF_VOLATILE; + if (!MEM_P (x)) + return; + *pflags |= MEM_VOLATILE_P (x) ? MEMREF_VOLATILE : MEMREF_NORMAL; +} + +/* Scan BB backwards, using df_simulate functions to keep track of + lifetimes, up to insn POINT. The result is stored in LIVE. */ + +void +simulate_backwards_to_point (basic_block bb, regset live, rtx point) +{ + rtx insn; + bitmap_copy (live, df_get_live_out (bb)); + df_simulate_initialize_backwards (bb, live); + + /* Scan and update life information until we reach the point we're + interested in. */ + for (insn = BB_END (bb); insn != point; insn = PREV_INSN (insn)) + df_simulate_one_insn_backwards (bb, insn, live); +} + +/* Return true if it is safe to move a group of insns, described by + the range FROM to TO, backwards across another group of insns, + described by ACROSS_FROM to ACROSS_TO. It is assumed that there + are no insns between ACROSS_TO and FROM, but they may be in + different basic blocks; MERGE_BB is the block from which the + insns will be moved. The caller must pass in a regset MERGE_LIVE + which specifies the registers live after TO. + + This function may be called in one of two cases: either we try to + move identical instructions from all successor blocks into their + predecessor, or we try to move from only one successor block. If + OTHER_BRANCH_LIVE is nonnull, it indicates that we're dealing with + the second case. It should contain a set of registers live at the + end of ACROSS_TO which must not be clobbered by moving the insns. + In that case, we're also more careful about moving memory references + and trapping insns. + + We return false if it is not safe to move the entire group, but it + may still be possible to move a subgroup. PMOVE_UPTO, if nonnull, + is set to point at the last moveable insn in such a case. */ + +bool +can_move_insns_across (rtx from, rtx to, rtx across_from, rtx across_to, + basic_block merge_bb, regset merge_live, + regset other_branch_live, rtx *pmove_upto) +{ + rtx insn, next, max_to; + bitmap merge_set, merge_use, local_merge_live; + bitmap test_set, test_use; + unsigned i, fail = 0; + bitmap_iterator bi; + int memrefs_in_across = 0; + int mem_sets_in_across = 0; + bool trapping_insns_in_across = false; + + if (pmove_upto != NULL) + *pmove_upto = NULL_RTX; + + /* Find real bounds, ignoring debug insns. */ + while (!NONDEBUG_INSN_P (from) && from != to) + from = NEXT_INSN (from); + while (!NONDEBUG_INSN_P (to) && from != to) + to = PREV_INSN (to); + + for (insn = across_to; ; insn = next) + { + if (NONDEBUG_INSN_P (insn)) + { + memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory, + NULL); + note_stores (PATTERN (insn), find_memory_stores, + &mem_sets_in_across); + /* This is used just to find sets of the stack pointer. */ + memrefs_in_across |= mem_sets_in_across; + trapping_insns_in_across |= may_trap_p (PATTERN (insn)); + } + next = PREV_INSN (insn); + if (insn == across_from) + break; + } + + /* Collect: + MERGE_SET = set of registers set in MERGE_BB + MERGE_USE = set of registers used in MERGE_BB and live at its top + MERGE_LIVE = set of registers live at the point inside the MERGE + range that we've reached during scanning + TEST_SET = set of registers set between ACROSS_FROM and ACROSS_END. + TEST_USE = set of registers used between ACROSS_FROM and ACROSS_END, + and live before ACROSS_FROM. */ + + merge_set = BITMAP_ALLOC (®_obstack); + merge_use = BITMAP_ALLOC (®_obstack); + local_merge_live = BITMAP_ALLOC (®_obstack); + test_set = BITMAP_ALLOC (®_obstack); + test_use = BITMAP_ALLOC (®_obstack); + + /* Compute the set of registers set and used in the ACROSS range. */ + if (other_branch_live != NULL) + bitmap_copy (test_use, other_branch_live); + df_simulate_initialize_backwards (merge_bb, test_use); + for (insn = across_to; ; insn = next) + { + if (NONDEBUG_INSN_P (insn)) + { + df_simulate_find_defs (insn, test_set); + df_simulate_defs (insn, test_use); + df_simulate_uses (insn, test_use); + } + next = PREV_INSN (insn); + if (insn == across_from) + break; + } + + /* Compute an upper bound for the amount of insns moved, by finding + the first insn in MERGE that sets a register in TEST_USE, or uses + a register in TEST_SET. We also check for calls, trapping operations, + and memory references. */ + max_to = NULL_RTX; + for (insn = from; ; insn = next) + { + if (CALL_P (insn)) + break; + if (NONDEBUG_INSN_P (insn)) + { + if (may_trap_p (PATTERN (insn)) + && (trapping_insns_in_across || other_branch_live != NULL)) + break; + + /* We cannot move memory stores past each other, or move memory + reads past stores, at least not without tracking them and + calling true_dependence on every pair. + + If there is no other branch and no memory references or + sets in the ACROSS range, we can move memory references + freely, even volatile ones. + + Otherwise, the rules are as follows: volatile memory + references and stores can't be moved at all, and any type + of memory reference can't be moved if there are volatile + accesses or stores in the ACROSS range. That leaves + normal reads, which can be moved, as the trapping case is + dealt with elsewhere. */ + if (other_branch_live != NULL || memrefs_in_across != 0) + { + int mem_ref_flags = 0; + int mem_set_flags = 0; + note_stores (PATTERN (insn), find_memory_stores, &mem_set_flags); + mem_ref_flags = for_each_rtx (&PATTERN (insn), find_memory, + NULL); + /* Catch sets of the stack pointer. */ + mem_ref_flags |= mem_set_flags; + + if ((mem_ref_flags | mem_set_flags) & MEMREF_VOLATILE) + break; + if ((memrefs_in_across & MEMREF_VOLATILE) && mem_ref_flags != 0) + break; + if (mem_set_flags != 0 + || (mem_sets_in_across != 0 && mem_ref_flags != 0)) + break; + } + df_simulate_find_uses (insn, merge_use); + /* We're only interested in uses which use a value live at + the top, not one previously set in this block. */ + bitmap_and_compl_into (merge_use, merge_set); + df_simulate_find_defs (insn, merge_set); + if (bitmap_intersect_p (merge_set, test_use) + || bitmap_intersect_p (merge_use, test_set)) + break; + max_to = insn; + } + next = NEXT_INSN (insn); + if (insn == to) + break; + } + if (max_to != to) + fail = 1; + + if (max_to == NULL_RTX || (fail && pmove_upto == NULL)) + goto out; + + /* Now, lower this upper bound by also taking into account that + a range of insns moved across ACROSS must not leave a register + live at the end that will be clobbered in ACROSS. We need to + find a point where TEST_SET & LIVE == 0. + + Insns in the MERGE range that set registers which are also set + in the ACROSS range may still be moved as long as we also move + later insns which use the results of the set, and make the + register dead again. This is verified by the condition stated + above. We only need to test it for registers that are set in + the moved region. + + MERGE_LIVE is provided by the caller and holds live registers after + TO. */ + bitmap_copy (local_merge_live, merge_live); + for (insn = to; insn != max_to; insn = PREV_INSN (insn)) + df_simulate_one_insn_backwards (merge_bb, insn, local_merge_live); + + /* We're not interested in registers that aren't set in the moved + region at all. */ + bitmap_and_into (local_merge_live, merge_set); + for (;;) + { + if (NONDEBUG_INSN_P (insn)) + { + if (!bitmap_intersect_p (test_set, local_merge_live)) + { + max_to = insn; + break; + } + + df_simulate_one_insn_backwards (merge_bb, insn, + local_merge_live); + } + if (insn == from) + { + fail = 1; + goto out; + } + insn = PREV_INSN (insn); + } + if (max_to != to) + fail = 1; + + if (pmove_upto) + *pmove_upto = max_to; + + /* For small register class machines, don't lengthen lifetimes of + hard registers before reload. */ + if (! reload_completed + && targetm.small_register_classes_for_mode_p (VOIDmode)) + { + EXECUTE_IF_SET_IN_BITMAP (merge_set, 0, i, bi) + { + if (i < FIRST_PSEUDO_REGISTER + && ! fixed_regs[i] + && ! global_regs[i]) + fail = 1; + } + } + + out: + BITMAP_FREE (merge_set); + BITMAP_FREE (merge_use); + BITMAP_FREE (local_merge_live); + BITMAP_FREE (test_set); + BITMAP_FREE (test_use); + + return !fail; +} /*---------------------------------------------------------------------------- Index: Makefile.in =================================================================== --- Makefile.in (revision 162372) +++ Makefile.in (working copy) @@ -3154,7 +3154,7 @@ df-core.o : df-core.c $(CONFIG_H) $(SYST df-problems.o : df-problems.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(RTL_H) insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \ hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \ - $(TM_P_H) $(FLAGS_H) output.h $(EXCEPT_H) dce.h vecprim.h + $(TM_P_H) $(TARGET_H) $(FLAGS_H) output.h $(EXCEPT_H) dce.h vecprim.h df-scan.o : df-scan.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \ hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \ Index: testsuite/gcc.target/arm/headmerge-1.c =================================================================== --- testsuite/gcc.target/arm/headmerge-1.c (revision 0) +++ testsuite/gcc.target/arm/headmerge-1.c (revision 0) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "#120" 1 } } */ + +extern void foo1 (int); +extern void foo2 (int); + +void t (int x, int y) +{ + if (y < 5) + foo1 (120); + else + foo2 (120); +} Index: testsuite/gcc.target/arm/headmerge-2.c =================================================================== --- testsuite/gcc.target/arm/headmerge-2.c (revision 0) +++ testsuite/gcc.target/arm/headmerge-2.c (revision 0) @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "120" 1 } } */ + +extern void foo1 (int); +extern void foo2 (int); +extern void foo3 (int); +extern void foo4 (int); +extern void foo5 (int); +extern void foo6 (int); + +void t (int x, int y) +{ + switch (y) + { + case 1: + foo1 (120); + break; + case 5: + foo2 (120); + break; + case 7: + foo3 (120); + break; + case 10: + foo4 (120); + break; + case 13: + foo5 (120); + break; + default: + foo6 (120); + break; + } +}