[2/2] loops: Invoke lim after successful loop interchange

Hi,

this patch modifies the loop invariant pass so that is can operate
only on a single requested loop and its sub-loops and ignore the rest
of the function, much like it currently ignores basic blocks that are
not in any real loop.  It then invokes it from within the loop
interchange pass when it successfully swaps two loops.  This avoids
the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r
(which are 19% and 15% faster than current master on an AMD zen2
machine) while not introducing a full LIM pass into the pass pipeline.

I have not modified the LIM data structures, this means that it still
contains vectors indexed by loop->num even though only a single loop
nest is actually processed.  I also did not replace the uses of
pre_and_rev_post_order_compute_fn with a function that would count a
postorder only for a given loop.  I can of course do so if the
approach is otherwise deemed viable.

The patch adds one additional global variable requested_loop to the
pass and then at various places behaves differently when it is set.  I
was considering storing the fake root loop into it for normal
operation, but since this loop often requires special handling anyway,
I came to the conclusion that the code would actually end up less
straightforward.

I have bootstrapped and tested the patch on x86_64-linux and a very
similar one on aarch64-linux.  I have also tested it by modifying the
tree_ssa_lim function to run loop_invariant_motion_from_loop on each
real outermost loop in a function and this variant also passed
bootstrap and all tests, including dump scans, of all languages.

I have built the entire SPEC 2006 FPrate monitoring the activity of
the LIM pass without and with the patch (on top of commit b642fca1c31
with which 526.blender_r and 538.imagick_r seemed to be failing) and
it only examined 0.2% more loops, 0.02% more BBs and even fewer
percent of statements because it is invoked only in a rather special
circumstance.  But the patch allows for more such need-based uses at
hopefully reasonable cost.

Since I do not have much experience with loop optimizers, I expect
that there will be requests to adjust the patch during the review.
Still, it fixes a performance regression against GCC 9 and so I hope
to address the concerns in time to get it into GCC 11.

Thanks,

Martin

gcc/ChangeLog:

2020-11-08  Martin Jambor  <mjambor@suse.cz>

	* gimple-loop-interchange.cc (pass_linterchange::execute): Call
	loop_invariant_motion_from_loop on affected loop nests.
	* tree-ssa-loop-im.c (requested_loop): New variable.
	(get_topmost_lim_loop): New function.
	(outermost_invariant_loop): Use it, cap discovered topmost loop at
	requested_loop.
	(determine_max_movement): Use get_topmost_lim_loop.
	(set_level): Assert that the selected loop is not outside of
	requested_loop.
	(compute_invariantness): Do not process loops outside of
	requested_loop, if non-NULL.
	(move_computations_worker): Likewise.
	(mark_ref_stored): Stop iteration at requested_loop, if non-NULL.
	(mark_ref_loaded): Likewise.
	(analyze_memory_references): If non-NULL, only process basic
	blocks and loops in requested_loop.  Compute contains_call bitmap.
	(do_store_motion): Only process requested_loop if non-NULL.
	(fill_always_executed_in): Likewise.  Also accept contains_call as
	a parameter rather than computing it.
	(tree_ssa_lim_initialize): New parameter which is stored into
	requested_loop.  Additonal dumping. Only initialize
	bb_loop_postorder for loops within requested_loop, if non-NULL.
	(tree_ssa_lim_finalize): Clear requested_loop, additional dumping.
	(loop_invariant_motion_from_loop): New function.
	(tree_ssa_lim): Move all functionality to
	loop_invariant_motion_from_loop, call it.
	* tree-ssa-loop-manip.h (loop_invariant_motion_from_loop): Declare.

---
 gcc/gimple-loop-interchange.cc |  30 +++++-
 gcc/tree-ssa-loop-im.c         | 176 ++++++++++++++++++++++++---------
 gcc/tree-ssa-loop-manip.h      |   2 +
 3 files changed, 156 insertions(+), 52 deletions(-)

Message ID	ri6pn4mz4j9.fsf@suse.cz
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 849423887025 From: Martin Jambor <mjambor@suse.cz> To: GCC Patches <gcc-patches@gcc.gnu.org> Subject: [PATCH 2/2] loops: Invoke lim after successful loop interchange User-Agent: Notmuch/0.31 (https://notmuchmail.org) Emacs/26.3 (x86_64-suse-linux-gnu) Date: Mon, 09 Nov 2020 20:58:02 +0100 Message-ID: <ri6pn4mz4j9.fsf@suse.cz> MIME-Version: 1.0 Content-Type: text/plain Precedence: list Cc: Richard Biener <rguenther@suse.de> Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	[1/2] cfgloop: Extend loop iteration macros to loop only over sub-loops \| expand [1/2] cfgloop: Extend loop iteration macros to loop only over sub-loops [2/2] loops: Invoke lim after successful loop interchange

[2/2] loops: Invoke lim after successful loop interchange

Commit Message

Comments

Patch