Message ID | 20170724112952.9218-1-amonakov@ispras.ru |
---|---|
State | New |
Headers | show |
Not commenting on the correctness... but On Mon, Jul 24, 2017 at 1:29 PM, Alexander Monakov <amonakov@ispras.ru> wrote: > + basic_block bb0 = bbs[0], bb1 = bbs[1]; > + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) > + bbs[0] = bb1, bbs[1] = bb0; > + } > + else if (__builtin_expect (n == 3, true)) > + { > + basic_block t, bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2]; > + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) > + t = bb0, bb0 = bb1, bb1 = t; > + if (bb_postorder[bb1->index] < bb_postorder[bb2->index]) > + { > + t = bb1, bb1 = bb2, bb2 = t; > + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) > + t = bb0, bb0 = bb1, bb1 = t; > + } > + bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2; ... maybe use std::swap() in all four cases?
On 07/24/2017 05:29 AM, Alexander Monakov wrote: > Profiling uses of qsort in GCC reveals that a significant portion of calls > comes from domwalk.c where child nodes in dominator tree are reordered > according to postorder numbering. However we know that in dominator trees > the vast majority of nodes have 0, 2 or 3 children, and handling those > cases separately allows to avoid qsort overhead. > > I don't have trunk figures handy, but on gcc-6 release-checking tramp3d > compilation this would eliminate about 63% of _all_ qsort calls by count, > or about 27% by time. Call count profile looks like: > > N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total > 16111 9406 228607 91870 20154 8317 3823 406971 > > With this patch it would look like > > N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total > 16111 9406 32200 34132 16966 8022 3702 149177 > > While at it, I've also added a clarifying comment to bb_postorder (it > actually gives reverse postorder), simplified casts in cmp_bb_postorder > and changed it to compute the result in a branchless fashion. > > Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are > unchanged except for domwalk.o and *-checksum.o. OK for trunk? > > * domwalk.c (cmp_bb_postorder): Simplify. > (sort_bbs_postorder): New function. Use it... > (dom_walker::walk): ...here to optimize common cases. Is it really beneficial to write cmp_bb_postorder without simple and trivial to understand conditionals? And if it is, then doesn't that actually suggest that we should have a match.pd pattern to turn the branchy code into a branchless sequence? As Uli noted, we should be using std::swap. Can you please repost ? THanks, Jeff
diff --git a/gcc/domwalk.c b/gcc/domwalk.c index a0daae6..dfb1993 100644 --- a/gcc/domwalk.c +++ b/gcc/domwalk.c @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3. If not see which is currently an abstraction over walking tree statements. Thus the dominator walker is currently only useful for trees. */ +/* Reverse postorder index of each basic block. */ static int *bb_postorder; static int cmp_bb_postorder (const void *a, const void *b) { - basic_block bb1 = *(basic_block *)const_cast<void *>(a); - basic_block bb2 = *(basic_block *)const_cast<void *>(b); - if (bb1->index == bb2->index) - return 0; + basic_block bb1 = *(const basic_block *)(a); + basic_block bb2 = *(const basic_block *)(b); + int n1 = bb_postorder[bb1->index], n2 = bb_postorder[bb2->index]; /* Place higher completion number first (pop off lower number first). */ - if (bb_postorder[bb1->index] > bb_postorder[bb2->index]) - return -1; - return 1; + return (n1 < n2) - (n1 > n2); +} + +/* Permute array BBS of N basic blocks in postorder, + i.e. by descending number in BB_POSTORDER array. */ + +static void +sort_bbs_postorder (basic_block *bbs, int n) +{ + if (__builtin_expect (n == 2, true)) + { + basic_block bb0 = bbs[0], bb1 = bbs[1]; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + bbs[0] = bb1, bbs[1] = bb0; + } + else if (__builtin_expect (n == 3, true)) + { + basic_block t, bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2]; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + t = bb0, bb0 = bb1, bb1 = t; + if (bb_postorder[bb1->index] < bb_postorder[bb2->index]) + { + t = bb1, bb1 = bb2, bb2 = t; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + t = bb0, bb0 = bb1, bb1 = t; + } + bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2; + } + else + qsort (bbs, n, sizeof *bbs, cmp_bb_postorder); } /* Constructor for a dom walker. @@ -284,16 +311,8 @@ dom_walker::walk (basic_block bb) for (dest = first_dom_son (m_dom_direction, bb); dest; dest = next_dom_son (m_dom_direction, dest)) worklist[sp++] = dest; - if (m_dom_direction == CDI_DOMINATORS) - switch (sp - saved_sp) - { - case 0: - case 1: - break; - default: - qsort (&worklist[saved_sp], sp - saved_sp, - sizeof (basic_block), cmp_bb_postorder); - } + if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS) + sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp); } /* NULL is used to mark pop operations in the recursion stack. */ while (sp > 0 && !worklist[sp - 1])