From patchwork Fri Dec 20 06:19:22 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Teresa Johnson X-Patchwork-Id: 303911 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 8B2792C0227 for ; Fri, 20 Dec 2013 17:19:35 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; q=dns; s= default; b=XpJNTdrPt4xsGTrm/UhOJ1h6Rt7zQYq2k3vg4H2Ps2+7Lzu24BC4z jn/3m9faURkhsxP04r92B/4YxysVbS6f6MdNG+chnGqllfIfK3iaT3N7nPprYeu5 72SvBLg1CCsuV4mEccNBBhEEMySkianQmOj04ejepYks8ag3Xxfix8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; s=default; bh=NLCy4UAT3QkEbeULzqqwiwMDCo0=; b=kdgg8F9mr7EsROWmW594kVnvOCh3 rsY5fxzO6dZYkXc98uqYMNbMhKtfP6gpJF0MUImAsMkiMgPRijSQ+RYrqR4GJb2s zdjW5DY+uvOq+SwblDvtRtiHJx0HOz3Z8btQhL0W8SOhNV5ubeYte4MbE+NON1+f t6X5VVWSUx0tSYU= Received: (qmail 2211 invoked by alias); 20 Dec 2013 06:19:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2199 invoked by uid 89); 20 Dec 2013 06:19:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qa0-f47.google.com Received: from mail-qa0-f47.google.com (HELO mail-qa0-f47.google.com) (209.85.216.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 20 Dec 2013 06:19:25 +0000 Received: by mail-qa0-f47.google.com with SMTP id w5so5618235qac.13 for ; Thu, 19 Dec 2013 22:19:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=IJGSCyDNJVhDpYLTcGjVWjtdFY0I9J4DXP1NOrbQb2o=; b=Butt4VXaM3Sozhn86hr9RjCD7aC50KrIa29eD4C8tvKyInXRWZsTO551cLFTI9af1y +Bq8eDkvWZ2R5oiHc4YCOOz5UCh+zWNof1bHu3T/uPtHB1ZElNhH4usO9fniF2j7URmL 55wYpYBlnI/OsShddy2BE833VEDKurrskdRm71PbvXpweoWdIjB/dUrBuQWmOhWdo+Ip msO8u/W3B/JznKWtXMtD4vkgur7CVQyG2xBwMHl5co4xwRxzXzav97AwKZSAzYO/1byU Qs7sVqZtFI4s7Xm4M8I6arzqzkGTSxUsOjljKAF/X+VpHA3SpZ3GoHBJ3yjtt80pa9yL f05w== X-Gm-Message-State: ALoCoQkMu0oD1y7hZdr09spulqu2n/R2qAzZw0MNTxQ24+j6dwMiqOPuHBu5l76VED3eo4NdHP8NQU6zrsB6X3C+4SG/CfbRGV7M58/K0pte7fz3MCcJcoY+NhD7RJf9MtwLa4tLFtR6K2hz2gn4X33dW8i8mrRN4ErY1hQSyH5djfVMFT8Mf6AsY0x0SXJWQCtEB4Yg80ivAYvbLohLglPFmVho734mFA== MIME-Version: 1.0 X-Received: by 10.224.56.5 with SMTP id w5mr10733334qag.60.1387520363138; Thu, 19 Dec 2013 22:19:23 -0800 (PST) Received: by 10.229.127.4 with HTTP; Thu, 19 Dec 2013 22:19:22 -0800 (PST) In-Reply-To: <20131213011309.GA21107@kam.mff.cuni.cz> References: <528BA299.7040606@redhat.com> <20131128140655.GA20730@kam.mff.cuni.cz> <529CB252.4070806@redhat.com> <20131213011309.GA21107@kam.mff.cuni.cz> Date: Thu, 19 Dec 2013 22:19:22 -0800 Message-ID: Subject: Re: [PATCH i386] Enable -freorder-blocks-and-partition From: Teresa Johnson To: Jan Hubicka Cc: =?UTF-8?Q?Martin_Li=C5=A1ka?= , "gcc-patches@gcc.gnu.org" X-IsSubscribed: yes On Thu, Dec 12, 2013 at 5:13 PM, Jan Hubicka wrote: >> On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška wrote: >> > Hello, >> > I prepared a collection of systemtap graphs for GIMP. >> > >> > 1) just my profile-based function reordering: 550 pages >> > 2) just -freorder-blocks-and-partitions: 646 pages >> > 3) just -fno-reorder-blocks-and-partitions: 638 pages >> > >> > Please see attached data. >> >> Thanks for the data. A few observations/questions: >> >> With both 1) (your (time-based?) reordering) and 2) >> (-freorder-blocks-and-partitions) there are a fair amount of accesses >> out of the cold section. I'm not seeing so many accesses out of the >> cold section in the apps I am looking at with splitting enabled. In > > I see you already comitted the patch, so perhaps Martin's measurement assume > the pass is off by default? > > I rebuilded GCC with profiledboostrap and with the linkerscript unmapping > text.unlikely. I get ICE in: > (gdb) bt > #0 diagnostic_set_caret_max_width(diagnostic_context*, int) () at ../../gcc/diagnostic.c:108 > #1 0x0000000000f68457 in diagnostic_initialize (context=0x18ae000 , n_opts=n_opts@entry=1290) at ../../gcc/diagnostic.c:135 > #2 0x000000000100050e in general_init (argv0=) at ../../gcc/toplev.c:1110 > #3 toplev_main(int, char**) () at ../../gcc/toplev.c:1922 > #4 0x00007ffff774cbe5 in __libc_start_main () from /lib64/libc.so.6 > #5 0x0000000000f7898d in _start () at ../sysdeps/x86_64/start.S:122 > > That is relatively early in startup process. The function seems inlined and > it fails only on second invocation, did not have time to investigate further, > yet while without -fprofile-use it starts... I'll see if I can reproduce this and investigate, although at this point that might have to wait until after my holiday vacation. > > On our periodic testers I see off-noise improvement in crafty 2200->2300 > and regression on Vortex, 2900->2800, plus code size increase. I had only run cpu2006, but not cpu2000. I'll see if I can reproduce this as well. I have been investigating a few places where I saw accesses in the cold split regions in internal benchmarks. Here are a couple, and how I have addressed them so far: 1) loop unswitching In this case, loop unswitching hoisted a branch from within the loop to outside the loop, and in doing so it was effectively speculated above several other branches. In it's original location it always went to only one of the successors (biased 0/100%). But when it was hoisted it sometimes took the previously 0% path. This led to executing out of the cold region, since we didn't update the branch probability when hoisting. I worked around this by assigning a small non-zero probability after hoisting with the following change: This should probably be refined (if prob_true is 100% we want to assign a small non-zero probability to the false path), and 10% may be too high (maybe give it 1%?). 2) More COMDAT issues My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. I believe this could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. The problem is how to address this. We can't simply suppress counts_to_freq from overwriting the guessed frequencies in this case, since the profile counts are non-zero and would not match the guessed probabilities. But we can't figure out which are called by much hotter callsites (compared to their entry count) until later when the callgraph is built, which is when we would know that we want to ignore the profile counts and use the guessed probabilities instead. The solution I came up with is to allow the profile counts to overwrite the guessed probabilites in counts_to_freq. But then when we inline we re-estimate the probabilities in the callee when the callsite count is much hotter than the entry count, and then follow the same procedure we were doing in the 0-count case (propagate the call count into the callee bb counts via the guessed probabilities). Is there a better solution? Thanks, Teresa > > Honza Index: tree-ssa-loop-unswitch.c =================================================================== --- tree-ssa-loop-unswitch.c (revision 205590) +++ tree-ssa-loop-unswitch.c (working copy) @@ -384,6 +384,8 @@ tree_unswitch_loop (struct loop *loop, extract_true_false_edges_from_block (unswitch_on, &edge_true, &edge_false); prob_true = edge_true->probability; + if (!prob_true) + prob_true = REG_BR_PROB_BASE/10; return loop_version (loop, unshare_expr (cond), NULL, prob_true, prob_true, REG_BR_PROB_BASE - prob_true, false);