From patchwork Fri Sep 15 01:24:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kugan Vivekanandarajah X-Patchwork-Id: 814012 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462188-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="vrt0RdF3"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xtd425VdJz9t2V for ; Fri, 15 Sep 2017 11:24:48 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; q=dns; s=default; b=CO4okDnnt5rU+hPO xRFrSBrMEU1qItMDqvBX5ZLrGbcm0XEloJD5ubeHLxo4dqNgEib25SnxlK05gFf4 OEzg/0yt2FAihJyvZkSwEfDlIby5kjHupZVK8NewKpFCWJpyf6MRfUWiDP0UReGx I7fc/iykBNcbdiOR694S1pDr7YA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; s=default; bh=9bU9vFsBC9+htNomkimbZD wnZFw=; b=vrt0RdF3zYmQZjhR6rhFLCA2soDfgaaOGJIH72Vi0uhbkwspWSEFAz PPgX5A1aviJPXFDFST8Ru6NvQPC3q2SjHJcbvNrHCB+AYJh6Lab5Qvr44d+QcuRR wHFW3HiCBT+hOtHKs+ZZLk3VLdPfV3RYgPZ5Mr54gwocA99YJOaww= Received: (qmail 100833 invoked by alias); 15 Sep 2017 01:24:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100823 invoked by uid 89); 15 Sep 2017 01:24:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=no version=3.3.2 spammy= X-HELO: mail-qt0-f172.google.com Received: from mail-qt0-f172.google.com (HELO mail-qt0-f172.google.com) (209.85.216.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Sep 2017 01:24:39 +0000 Received: by mail-qt0-f172.google.com with SMTP id 47so975528qts.10 for ; Thu, 14 Sep 2017 18:24:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=nUeQaMqf6CSiEU5OOCWAlPCSylhv+t6UOELQSoazEx8=; b=d2SKvm95FS+Yz/C+kwrkL3pdgzrASIvn56lUEBnCNfX3GTlXC+lbyIfwiQ5jHYUfG3 T53AnH+bjWyl+4OQc7XyxAR6x/imwUwCORARZKSX93w94dsYtTRTj1ELfDjQoJ+ZaUIP MXapkUmtv+/Zwt8liB3Z6xQmw4N+NPV4JU6V6IDTZIfXoKJoxkzf4hvMSrQqZ+RYuAOx qCBOKWqNcLaHBD99CqvF1vr60GVaddeWal7KNIfwsFISi96rJaWZAB099x/lqEwkIaTx tVcwCwAwfxxaiN+SYwGsRYx1aaZd0C8/qUMINRZq8C6DedCZV/cl2tR86YP86NSWbiHW sqcg== X-Gm-Message-State: AHPjjUj4L+ga2fYoEVLkI+MDz1PYbA1ZBonzIf2BWWm2lg+j0qTuxiqy 99fGWOpG+sJ2pcRqOl7YTIH1T2LcdMBKQOLybESZEfVYx6Q= X-Google-Smtp-Source: AOwi7QCdRPx1D2tg17Rw0vQuT66Q8Hj2oHwmKL4yXUTxJdy1XzYFS4Y4OTIX9J6JXaAkTZjLpU4b+EdsrVR1Wc03ufw= X-Received: by 10.200.56.124 with SMTP id r57mr35230332qtb.172.1505438677405; Thu, 14 Sep 2017 18:24:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.237.37.211 with HTTP; Thu, 14 Sep 2017 18:24:36 -0700 (PDT) From: Kugan Vivekanandarajah Date: Fri, 15 Sep 2017 11:24:36 +1000 Message-ID: Subject: [RFC][PATCH 0/5] Loop unrolling and memory load streams To: "gcc-patches@gcc.gnu.org" X-IsSubscribed: yes While loop unrolling helps to keep the pipeline busy in modern processors, it also can increase the memory streams resulting in collisions for the hardware prefetcher that can impact performance. This patch series tries to detect this and limit the loop unrolling. Patch 1 : Add separate parms for rtl unroller: Patch2: Add number of hw prefetchers available to cpu_prefetch_tune so it can be used in loop unrolling decisions: Patch3: Prevent tree unroller from completely unrolling inner loops if that results in excessive strided-loads in outer loop: Patch4: Change iv_analyze_result to take const_rtx. This is just to make the next patch compile. No functional changes: Patch5: add aarch64_loop_unroll_adjust to limit partial unrolling in rtl based on strided-loads in loop: Bootstrapped and tested on aarch64-linux-gnu (with –funroll-all-loops). Testing on x86_64-linux-gnu ongoing. Thanks, Kugan