From patchwork Mon Jun 27 22:33:35 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fang, Changpeng" X-Patchwork-Id: 102287 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 30026B6F68 for ; Tue, 28 Jun 2011 08:34:18 +1000 (EST) Received: (qmail 20542 invoked by alias); 27 Jun 2011 22:34:15 -0000 Received: (qmail 19797 invoked by uid 22791); 27 Jun 2011 22:34:09 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, TW_AV, TW_SV, TW_VX, TW_VZ X-Spam-Check-By: sourceware.org Received: from ch1ehsobe001.messaging.microsoft.com (HELO CH1EHSOBE009.bigfish.com) (216.32.181.181) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 27 Jun 2011 22:33:44 +0000 Received: from mail116-ch1-R.bigfish.com (216.32.181.171) by CH1EHSOBE009.bigfish.com (10.43.70.59) with Microsoft SMTP Server id 14.1.225.22; Mon, 27 Jun 2011 22:33:42 +0000 Received: from mail116-ch1 (localhost.localdomain [127.0.0.1]) by mail116-ch1-R.bigfish.com (Postfix) with ESMTP id 7402260828E; Mon, 27 Jun 2011 22:33:42 +0000 (UTC) X-SpamScore: -22 X-BigFish: VPS-22(zz9371M4015L154dM1432N98dKzz1202hzz8275bh8275dhz32i668h839h34h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: CIP:163.181.249.109; KIP:(null); UIP:(null); IPVD:NLI; H:ausb3twp02.amd.com; RD:none; EFVD:NLI Received: from mail116-ch1 (localhost.localdomain [127.0.0.1]) by mail116-ch1 (MessageSwitch) id 1309214022134848_15657; Mon, 27 Jun 2011 22:33:42 +0000 (UTC) Received: from CH1EHSMHS007.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.249]) by mail116-ch1.bigfish.com (Postfix) with ESMTP id 13783A2004F; Mon, 27 Jun 2011 22:33:42 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by CH1EHSMHS007.bigfish.com (10.43.70.7) with Microsoft SMTP Server id 14.1.225.22; Mon, 27 Jun 2011 22:33:39 +0000 X-M-MSG: Received: from sausexedgep01.amd.com (sausexedgep01-ext.amd.com [163.181.249.72]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Axway MailGate 3.8.1) with ESMTP id 2C4B5C8765; Mon, 27 Jun 2011 17:33:33 -0500 (CDT) Received: from sausexhtp02.amd.com (163.181.3.152) by sausexedgep01.amd.com (163.181.36.54) with Microsoft SMTP Server (TLS) id 8.3.106.1; Mon, 27 Jun 2011 17:33:41 -0500 Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp02.amd.com ([163.181.3.152]) with mapi; Mon, 27 Jun 2011 17:33:36 -0500 From: "Fang, Changpeng" To: "Jagasia, Harsha" , "'H.J. Lu'" , "gcc-patches@gcc.gnu.org" CC: "'hubicka@ucw.cz'" , "'ubizjak@gmail.com'" , "'hongjiu.lu@intel.com'" Date: Mon, 27 Jun 2011 17:33:35 -0500 Subject: RE: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware. Message-ID: References: <20110620165806.6911.15686.sendpatchset@gccpike4.amd.com> , <63EE40A00BA43F49B85FACBB03F078B60816B33089@sausexmbp02.amd.com> In-Reply-To: <63EE40A00BA43F49B85FACBB03F078B60816B33089@sausexmbp02.amd.com> MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, Attached are the patches we propose to backport to gcc 4.6 branch which are related to avx256 unaligned load/store splitting. As we mentioned before, The combined effect of these patches are positive on both AMD and Intel CPUs on cpu2006 and polyhedron 2005. 0001-Split-32-byte-AVX-unaligned-load-store.patch Initial patch that implements unaligned load/store splitting 0001-Don-t-assert-unaligned-256bit-load-store.patch Remove the assert. 0001-Fix-a-typo-in-mavx256-split-unaligned-store.patch Fix a typo. 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch Disable unaligned load splitting for bdver1. All these patches are in 4.7 trunk. Bootstrap and tests are on-going in gcc 4.6 branch. Is It OK to commit to 4.6 branch as long as the tests pass? Thanks, Changpeng From 50310fc367348b406fc88d54c3ab54d1a304ad52 Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Mon, 13 Jun 2011 13:13:32 -0700 Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial * config/i386/i386.c (avx256_split_unaligned_load): New definition. (avx256_split_unaligned_store): New definition. (ix86_option_override_internal): Enable avx256 unaligned load(store) splitting only when avx256_split_unaligned_load(store) is set. --- gcc/config/i386/i386.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7b266b9..3bc0b53 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2121,6 +2121,12 @@ static const unsigned int x86_arch_always_fancy_math_387 = m_PENT | m_ATOM | m_PPRO | m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC; +static const unsigned int x86_avx256_split_unaligned_load + = m_COREI7 | m_GENERIC; + +static const unsigned int x86_avx256_split_unaligned_store + = m_COREI7 | m_BDVER1 | m_GENERIC; + /* In case the average insn count for single function invocation is lower than this constant, emit fast (but longer) prologue and epilogue code. */ @@ -4194,9 +4200,11 @@ ix86_option_override_internal (bool main_args_p) if (flag_expensive_optimizations && !(target_flags_explicit & MASK_VZEROUPPER)) target_flags |= MASK_VZEROUPPER; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) + if ((x86_avx256_split_unaligned_load & ix86_tune_mask) + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) + if ((x86_avx256_split_unaligned_store & ix86_tune_mask) + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; } } -- 1.7.0.4