From patchwork Fri Jun 17 22:18:58 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fang, Changpeng" X-Patchwork-Id: 100872 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 00B63B6FDD for ; Sat, 18 Jun 2011 08:22:00 +1000 (EST) Received: (qmail 3982 invoked by alias); 17 Jun 2011 22:21:58 -0000 Received: (qmail 3972 invoked by uid 22791); 17 Jun 2011 22:21:57 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL, BAYES_00, TW_AV, TW_BD X-Spam-Check-By: sourceware.org Received: from tx2ehsobe002.messaging.microsoft.com (HELO TX2EHSOBE003.bigfish.com) (65.55.88.12) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 17 Jun 2011 22:21:41 +0000 Received: from mail41-tx2-R.bigfish.com (10.9.14.249) by TX2EHSOBE003.bigfish.com (10.9.40.23) with Microsoft SMTP Server id 14.1.225.22; Fri, 17 Jun 2011 22:21:40 +0000 Received: from mail41-tx2 (localhost.localdomain [127.0.0.1]) by mail41-tx2-R.bigfish.com (Postfix) with ESMTP id 2513DDE82AA; Fri, 17 Jun 2011 22:21:40 +0000 (UTC) X-SpamScore: -15 X-BigFish: VPS-15(z1039oz9371M4015L1432N1453M98dKzz1202hzz8275bh8275dhz32i668h839h34h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: CIP:163.181.249.108; KIP:(null); UIP:(null); IPVD:NLI; H:ausb3twp01.amd.com; RD:none; EFVD:NLI Received: from mail41-tx2 (localhost.localdomain [127.0.0.1]) by mail41-tx2 (MessageSwitch) id 1308349299791211_21875; Fri, 17 Jun 2011 22:21:39 +0000 (UTC) Received: from TX2EHSMHS043.bigfish.com (unknown [10.9.14.242]) by mail41-tx2.bigfish.com (Postfix) with ESMTP id BABF0628050; Fri, 17 Jun 2011 22:21:39 +0000 (UTC) Received: from ausb3twp01.amd.com (163.181.249.108) by TX2EHSMHS043.bigfish.com (10.9.99.143) with Microsoft SMTP Server id 14.1.225.22; Fri, 17 Jun 2011 22:21:39 +0000 X-M-MSG: Received: from sausexedgep01.amd.com (sausexedgep01-ext.amd.com [163.181.249.72]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp01.amd.com (Axway MailGate 3.8.1) with ESMTP id 2131510282CD; Fri, 17 Jun 2011 17:21:34 -0500 (CDT) Received: from sausexhtp01.amd.com (163.181.3.165) by sausexedgep01.amd.com (163.181.36.54) with Microsoft SMTP Server (TLS) id 8.3.106.1; Fri, 17 Jun 2011 17:22:08 -0500 Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp01.amd.com ([163.181.3.165]) with mapi; Fri, 17 Jun 2011 17:21:37 -0500 From: "Fang, Changpeng" To: "H.J. Lu" CC: Richard Guenther , "gcc-patches@gcc.gnu.org" Date: Fri, 17 Jun 2011 17:18:58 -0500 Subject: RE: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic Message-ID: References: , In-Reply-To: MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, I added AVX256_SPLIT_UNALIGNED_STORE to ix86_tune_indices and put m_COREI7, m_BDVER1 and m_GENERIC as the targets that enable it. Is this OK? Thanks, Changpeng From 91e715213bb37d089cb490e769b115d1d131918f Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Mon, 13 Jun 2011 13:13:32 -0700 Subject: [PATCH 2/2] pr49089: enable avx256 splitting unaligned load/store only when beneficial * config/i386/i386.h (ix86_tune_indices): Introduce X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL and X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL. (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL): New definition. (TARGET_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL): New definition. * config/i386/i386.c (ix86_tune_features): Add entries for X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL and X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL. (ix86_option_override_internal): Enable avx256 unaligned load(store) splitting when TARGET_AVX256_SPLIT_UNALIGNED_LOAD(STORE)_OPTIMAL are set. --- gcc/config/i386/i386.c | 17 ++++++++++++++--- gcc/config/i386/i386.h | 4 ++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7b266b9..b50d349 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2088,7 +2088,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching at -O3. For the moment, the prefetching seems badly tuned for Intel chips. */ - m_K6_GEODE | m_AMD_MULTIPLE + m_K6_GEODE | m_AMD_MULTIPLE, + + /* X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL: Enable splitting 256-bit + unaligned load. It hurts the performance on Bulldozer. We need to + re-tune the generic options for current cpus! */ + m_COREI7 | m_GENERIC, + + /* X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL: Enable splitting 256-bit + unaligned store. */ + m_COREI7 | m_BDVER1 | m_GENERIC }; /* Feature tests against the various architecture variations. */ @@ -4194,9 +4203,11 @@ ix86_option_override_internal (bool main_args_p) if (flag_expensive_optimizations && !(target_flags_explicit & MASK_VZEROUPPER)) target_flags |= MASK_VZEROUPPER; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) + if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; - if (!(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) + if (TARGET_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL + && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; } } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 8badcbb..b6e5570 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -312,6 +312,8 @@ enum ix86_tune_indices { X86_TUNE_OPT_AGU, X86_TUNE_VECTORIZE_DOUBLE, X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL, + X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL, + X86_TUNE_AVX256_SPLIT_UNALIGNED_STORE_OPTIMAL, X86_TUNE_LAST }; @@ -410,6 +412,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE] #define TARGET_SOFTWARE_PREFETCHING_BENEFICIAL \ ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL] +#define TARGET_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL \ + ix86_tune_features[X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL] /* Feature tests against the various architecture variations. */ enum ix86_arch_indices { -- 1.7.0.4