From patchwork Mon Jul 24 11:29:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Monakov X-Patchwork-Id: 792739 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-458767-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="b4C1ss+R"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xGK2624ysz9s74 for ; Mon, 24 Jul 2017 21:31:09 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id; q=dns; s=default; b=WzHRgLbh6+uPTFF /q+TfmaHWPa3i7cy5RJ0R6RseJw3keWkO1NVQzXAaNLcRoeatSIeAQVH6StY9Tgj 3kUJJRODPePYPlFO8RCp494wrns6/Y8RcsAyPQnVUjDSYuJ6Um8BrfIGjMxG4Qaj 5/5n7XmK4SqzBXCLbz6RndcWxpMQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id; s=default; bh=Vugz0E6IeluI/wOO279Vm XYl/RM=; b=b4C1ss+R3SfU7wetcRhRsqheA6rgzqDYRNEFyHmTvhotNIAl7ETUk Ja261M2eVCMQRGyfeNaUdz9U1eAzPjJ6zwhov7uGxF0/B/LvcIkHSUqnS2hwYRk/ Uh0HquDB+Q1MRvP5BnEqEvOUy1b7XifTehZETMDuhWmcUm37prhJY0= Received: (qmail 87060 invoked by alias); 24 Jul 2017 11:30:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 70376 invoked by uid 89); 24 Jul 2017 11:30:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=walker, vast X-HELO: smtp.ispras.ru Received: from bran.ispras.ru (HELO smtp.ispras.ru) (83.149.199.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 24 Jul 2017 11:30:38 +0000 Received: from monopod.intra.ispras.ru (monopod.intra.ispras.ru [10.10.3.121]) by smtp.ispras.ru (Postfix) with ESMTP id 6E1425FB6D for ; Mon, 24 Jul 2017 14:30:24 +0300 (MSK) From: Alexander Monakov To: gcc-patches@gcc.gnu.org Subject: [PATCH] Optimize BB sorting in domwalk Date: Mon, 24 Jul 2017 14:29:52 +0300 Message-Id: <20170724112952.9218-1-amonakov@ispras.ru> Profiling uses of qsort in GCC reveals that a significant portion of calls comes from domwalk.c where child nodes in dominator tree are reordered according to postorder numbering. However we know that in dominator trees the vast majority of nodes have 0, 2 or 3 children, and handling those cases separately allows to avoid qsort overhead. I don't have trunk figures handy, but on gcc-6 release-checking tramp3d compilation this would eliminate about 63% of _all_ qsort calls by count, or about 27% by time. Call count profile looks like: N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total 16111 9406 228607 91870 20154 8317 3823 406971 With this patch it would look like N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total 16111 9406 32200 34132 16966 8022 3702 149177 While at it, I've also added a clarifying comment to bb_postorder (it actually gives reverse postorder), simplified casts in cmp_bb_postorder and changed it to compute the result in a branchless fashion. Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are unchanged except for domwalk.o and *-checksum.o. OK for trunk? * domwalk.c (cmp_bb_postorder): Simplify. (sort_bbs_postorder): New function. Use it... (dom_walker::walk): ...here to optimize common cases. --- gcc/domwalk.c | 53 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/gcc/domwalk.c b/gcc/domwalk.c index a0daae6..dfb1993 100644 --- a/gcc/domwalk.c +++ b/gcc/domwalk.c @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3. If not see which is currently an abstraction over walking tree statements. Thus the dominator walker is currently only useful for trees. */ +/* Reverse postorder index of each basic block. */ static int *bb_postorder; static int cmp_bb_postorder (const void *a, const void *b) { - basic_block bb1 = *(basic_block *)const_cast(a); - basic_block bb2 = *(basic_block *)const_cast(b); - if (bb1->index == bb2->index) - return 0; + basic_block bb1 = *(const basic_block *)(a); + basic_block bb2 = *(const basic_block *)(b); + int n1 = bb_postorder[bb1->index], n2 = bb_postorder[bb2->index]; /* Place higher completion number first (pop off lower number first). */ - if (bb_postorder[bb1->index] > bb_postorder[bb2->index]) - return -1; - return 1; + return (n1 < n2) - (n1 > n2); +} + +/* Permute array BBS of N basic blocks in postorder, + i.e. by descending number in BB_POSTORDER array. */ + +static void +sort_bbs_postorder (basic_block *bbs, int n) +{ + if (__builtin_expect (n == 2, true)) + { + basic_block bb0 = bbs[0], bb1 = bbs[1]; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + bbs[0] = bb1, bbs[1] = bb0; + } + else if (__builtin_expect (n == 3, true)) + { + basic_block t, bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2]; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + t = bb0, bb0 = bb1, bb1 = t; + if (bb_postorder[bb1->index] < bb_postorder[bb2->index]) + { + t = bb1, bb1 = bb2, bb2 = t; + if (bb_postorder[bb0->index] < bb_postorder[bb1->index]) + t = bb0, bb0 = bb1, bb1 = t; + } + bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2; + } + else + qsort (bbs, n, sizeof *bbs, cmp_bb_postorder); } /* Constructor for a dom walker. @@ -284,16 +311,8 @@ dom_walker::walk (basic_block bb) for (dest = first_dom_son (m_dom_direction, bb); dest; dest = next_dom_son (m_dom_direction, dest)) worklist[sp++] = dest; - if (m_dom_direction == CDI_DOMINATORS) - switch (sp - saved_sp) - { - case 0: - case 1: - break; - default: - qsort (&worklist[saved_sp], sp - saved_sp, - sizeof (basic_block), cmp_bb_postorder); - } + if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS) + sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp); } /* NULL is used to mark pop operations in the recursion stack. */ while (sp > 0 && !worklist[sp - 1])