From patchwork Thu Nov 10 17:10:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 693369 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tF8gh5NrSz9vDZ for ; Fri, 11 Nov 2016 04:10:24 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="eE646uHT"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=DKW 6dhJY8xLUjuqAJiOf37JMZLiKuwrI2XR5VJvvPnaonp6eXTWHlSfP7IPvY9+Ct2v EqA+cOE49dR7t1GA0hx1gHrEhQ8Ao3SaoVJu1diE4dwZcnl2Ozgu6bvVYhQfXZs+ xe44X2DzUyuT3dOMJGuWlCZG5YQ39+COoDdUe04k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=default; bh=olPw8oSkF HnBViqcQj01fGwy4hY=; b=eE646uHTu6EdKFulPKYcUqDBETaU5CQWkU4LKSGyG 2KQtCES/rvIfPbACd9X5D5UhPIYxRJwUc8CKeGlMrLCXc8fMP0MWxQVynP+z4QdI 7Zl4irlU5E/qn0HZoUhLKqfy1wiehUH5ddcGrjDr2C9V3hK7bG8GbqPvIPBWdhQ/ Ms= Received: (qmail 62146 invoked by alias); 10 Nov 2016 17:10:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 62131 invoked by uid 89); 10 Nov 2016 17:10:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=H*RU:15.01.0721.010, H*r:15.01.0721.010, Hx-spam-relays-external:15.01.0721.010, H*RU:15.1.721.10 X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-he1eur01on0058.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (104.47.0.58) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 10 Nov 2016 17:10:04 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com (10.175.46.18) by AM5PR0802MB2610.eurprd08.prod.outlook.com (10.175.46.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.721.10; Thu, 10 Nov 2016 17:10:00 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) by AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) with mapi id 15.01.0721.010; Thu, 10 Nov 2016 17:10:00 +0000 From: Wilco Dijkstra To: GCC Patches CC: nd Subject: [PATCH][AArch64] Tweak Cortex-A57 vector cost Date: Thu, 10 Nov 2016 17:10:00 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-microsoft-exchange-diagnostics: 1; AM5PR0802MB2610; 7:LbwMu6fZRrAspvY35OhyS0cos8OBRwdqjLltkf11sg80Rppu9DvSnTeGysNN6tzpnRrc0zZt0Q9Q9uZKP87cxC4nNN3+VK0c72yLGHR7EcgrTS0doFxhp0mkp1JVQ2w+EeTvb5fqP3ynMBqlypEF2MZDUoOECTE8S45b2AWu9YPpPdjnDAk57h1SoLfXDrIlX6aZ6yq0Z3iH3pB7SAs8cQuCvYzA9oUlD2n8BbmOBgzSRPtIEaPF3zx/cPtp+1BrMazsADE6oqVNErMdcb6hCvZrsN+U0m+tnPxJKYZLT5K3NnoWHuSM7b6yLy/LAVr7hIgcqFJ2BSMGtAU24MCnTU1+evkD0Kk1yhsBW+Hhxjo= x-ms-office365-filtering-correlation-id: 756ae466-5ea3-40d6-e66d-08d4098c67e9 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:AM5PR0802MB2610; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917)(20558992708506)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026); SRVR:AM5PR0802MB2610; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0802MB2610; x-forefront-prvs: 01221E3973 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(377424004)(199003)(189002)(54534003)(87936001)(66066001)(81166006)(81156014)(8936002)(189998001)(7846002)(54356999)(8676002)(4001150100001)(86362001)(68736007)(586003)(2906002)(4326007)(9686002)(97736004)(122556002)(3846002)(6116002)(102836003)(7736002)(76576001)(5660300001)(50986999)(305945005)(33656002)(92566002)(6916009)(2900100001)(101416001)(450100001)(3280700002)(3660700001)(7696004)(74316002)(77096005)(106116001)(110136003)(105586002)(106356001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0802MB2610; H:AM5PR0802MB2610.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Nov 2016 17:10:00.8625 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2610 The existing vector costs stop some beneficial vectorization. This is mostly due to vector statement cost being set to 3 as well as vector loads having a higher cost than scalar loads. This means that even when we vectorize 4x, it is possible that the cost of a vectorized loop is similar to the scalar version, and we fail to vectorize. For example for a particular loop the costs for -mcpu=generic are: note: Cost model analysis: Vector inside of loop cost: 146 Vector prologue cost: 5 Vector epilogue cost: 0 Scalar iteration cost: 50 Scalar outside cost: 0 Vector outside cost: 5 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 1 note: Runtime profitability threshold = 3 note: Static estimate profitability threshold = 3 note: loop vectorized While -mcpu=cortex-a57 reports: note: Cost model analysis: Vector inside of loop cost: 294 Vector prologue cost: 15 Vector epilogue cost: 0 Scalar iteration cost: 74 Scalar outside cost: 0 Vector outside cost: 15 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 31 note: Runtime profitability threshold = 30 note: Static estimate profitability threshold = 30 note: not vectorized: vectorization not profitable. note: not vectorized: iteration count smaller than user specified loop bound parameter or minimum profitable iterations (whichever is more conservative). Using a cost of 3 for a vector operation suggests they are 3 times as expensive as scalar operations. Since most vector operations have a similar throughput as scalar operations, this is not correct. Using slightly lower values for these heuristics now allows this loop and many others to be vectorized. On a proprietary benchmark the gain from vectorizing this loop is around 15-30% which shows vectorizing it is indeed beneficial. ChangeLog: 2016-11-10 Wilco Dijkstra * config/aarch64/aarch64.c (cortexa57_vector_cost): Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost = 1, /* scalar_stmt_cost */ 4, /* scalar_load_cost */ 1, /* scalar_store_cost */ - 3, /* vec_stmt_cost */ + 2, /* vec_stmt_cost */ 3, /* vec_permute_cost */ 8, /* vec_to_scalar_cost */ 8, /* scalar_to_vec_cost */ - 5, /* vec_align_load_cost */ - 5, /* vec_unalign_load_cost */ + 4, /* vec_align_load_cost */ + 4, /* vec_unalign_load_cost */ 1, /* vec_unalign_store_cost */ 1, /* vec_store_cost */ 1, /* cond_taken_branch_cost */