From patchwork Fri Oct 22 00:02:41 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fang, Changpeng" X-Patchwork-Id: 68787 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 85F24B6EF2 for ; Fri, 22 Oct 2010 11:04:16 +1100 (EST) Received: (qmail 16390 invoked by alias); 22 Oct 2010 00:04:14 -0000 Received: (qmail 16377 invoked by uid 22791); 22 Oct 2010 00:04:06 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, TW_BD, TW_FC, TW_FX X-Spam-Check-By: sourceware.org Received: from tx2ehsobe003.messaging.microsoft.com (HELO TX2EHSOBE005.bigfish.com) (65.55.88.13) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 22 Oct 2010 00:03:53 +0000 Received: from mail52-tx2-R.bigfish.com (10.9.14.248) by TX2EHSOBE005.bigfish.com (10.9.40.25) with Microsoft SMTP Server id 14.1.225.8; Fri, 22 Oct 2010 00:03:50 +0000 Received: from mail52-tx2 (localhost.localdomain [127.0.0.1]) by mail52-tx2-R.bigfish.com (Postfix) with ESMTP id 130825E824A; Fri, 22 Oct 2010 00:03:50 +0000 (UTC) X-SpamScore: -10 X-BigFish: VPS-10(zz4015LaceOzz1202hzzz32i2a8h34h61h) X-Spam-TCS-SCL: 0:0 Received: from mail52-tx2 (localhost.localdomain [127.0.0.1]) by mail52-tx2 (MessageSwitch) id 1287705825304430_32755; Fri, 22 Oct 2010 00:03:45 +0000 (UTC) Received: from TX2EHSMHS004.bigfish.com (unknown [10.9.14.252]) by mail52-tx2.bigfish.com (Postfix) with ESMTP id 52949518071; Fri, 22 Oct 2010 00:02:47 +0000 (UTC) Received: from ausb3extmailp02.amd.com (163.181.251.22) by TX2EHSMHS004.bigfish.com (10.9.99.104) with Microsoft SMTP Server (TLS) id 14.0.482.44; Fri, 22 Oct 2010 00:02:47 +0000 Received: from ausb3twp01.amd.com (ausb3twp01.amd.com [163.181.250.37]) by ausb3extmailp02.amd.com (Switch-3.2.7/Switch-3.2.7) with SMTP id o9M07g0Q002382; Thu, 21 Oct 2010 19:07:46 -0500 X-M-MSG: Received: from sausexhtp02.amd.com (sausexhtp02.amd.com [163.181.3.152]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ausb3twp01.amd.com (Tumbleweed MailGate 3.7.2) with ESMTP id 2304B1028BE0; Thu, 21 Oct 2010 19:02:40 -0500 (CDT) Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp02.amd.com ([163.181.3.152]) with mapi; Thu, 21 Oct 2010 19:02:41 -0500 From: "Fang, Changpeng" To: "hubicka@ucw.cz" , "gcc-patches@gcc.gnu.org" CC: "rth@redhat.com" , "vmakarov@redhat.com" Date: Thu, 21 Oct 2010 19:02:41 -0500 Subject: [PATCH i386] Implementation of the pipeline description for bdver1 (Bulldozer processors) Message-ID: MIME-Version: 1.0 X-Reverse-DNS: ausb3extmailp02.amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, Attached is the patch that implements the pipeline description for Bulldozer processors (bdver1). The implementation is based on the current available information and may subject to change in the future. Successful bootstrapping on x86_64-unknown-linux-gnu (Not with -march=bdver1). And the patched compiler could successfully build cpu2006 with -O3 -march=bdver1. Is it OK to c9ommit to trunk? Thanks, Changpeng From 54dbc2bd2cda8a3d8b857a5058ee651e89cc1e31 Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Thu, 21 Oct 2010 14:31:18 -0700 Subject: [PATCH] Implementation of the pipeline description for Bulldozer (bdver1) * gcc/config/i386/bdver1.md: New file that implements the pipe description for bdver1. (bdver1_decode): New insn attribute. (bdver1,bdver1_int,bdver1_load,bdver1_mult,bdver1_fp): New automations. (bdver1-decode0): New cpu_unit. (bdver1-decode1): Likewise. (bdver1-decode2): Likewise. (bdver1-decodev): Likewise. (bdver1-ieu0): Likewise. (bdver1-ieu1): Likewise. (bdver1-agu0): Likewise. (bdver1-agu1): Likewise. (bdver1-mult): Likewise. (bdver1-load0): Likewise. (bdver1-load1): Likewise. (bdver1-ffma0): Likewise. (bdver1-ffma1): Likewise. (bdver1-fmal0): Likewise. (bdver1-fmal1): Likewise. (bdver1-vector): New reservation. (bdver1-direct1): Likewise. (bdver1-direct): Likewise. (bdver1-double): Likewise. (bdver1-ieu): Likewise. (bdver1-agu): Likewise. (bdver1-load): Likewise. (bdver1-load2): Likewise. (bdver1-store): Likewise. (bdver1-store2): Likewise. (bdver1-fpsched): Likewise. (bdver1-fpload): Likewise. (bdver1-fpload2): Likewise. (bdver1-ffma): Likewise. (bdver1-fcvt): Likewise. (bdver1-fmma): Likewise. (bdver1-fxbar): Likewise. (bdver1-fmal): Likewise. (bdver1-fsto): Likewise. (bdver1-fvector): Likewise. (bdver1_call): New insn reservation. (bdver1_push): Likewise. (bdver1_pop): Likewise. (bdver1_leave): Likewise. (bdver1_lea): Likewise. (bdver1_imul_DI): Likewise. (bdver1_imul): Likewise. (bdver1_imul_mem_DI): Likewise. (bdver1_imul_mem): Likewise. (bdver1_idiv): Likewise. (bdver1_idiv_mem): Likewise. (bdver1_str): Likewise. (bdver1_idirect): Likewise. (bdver1_ivector): Likewise. (bdver1_idirect_loadmov): Likewise. (bdver1_idirect_load): Likewise. (bdver1_ivector_load): Likewise. (bdver1_idirect_movstore): Likewise. (bdver1_idirect_both): Likewise. (bdver1_ivector_both): Likewise. (bdver1_idirect_store): Likewise. (bdver1_ivector_store): Likewise. (bdver1_fldxf): Likewise. (bdver1_fld): Likewise. (bdver1_fstxf): Likewise. (bdver1_fst): Likewise. (bdver1_fist): Likewise. (bdver1_fmov_bdver1): Likewise. (bdver1_fadd_load): Likewise. (bdver1_fadd): Likewise. (bdver1_fmul_load): Likewise. (bdver1_fmul): Likewise. (bdver1_fsgn): Likewise. (bdver1_fdiv_load): Likewise. (bdver1_fdiv): Likewise. (bdver1_fpspc_load): Likewise. (bdver1_fpspc): Likewise. (bdver1_fcmov_load): Likewise. (bdver1_fcmov): Likewise. (bdver1_fcomi_load): Likewise. (bdver1_fcomi): Likewise. (bdver1_fcom_load): Likewise. (bdver1_fcom): Likewise. (bdver1_fxch): Likewise. (bdver1_ssevector_avx128_unaligned_load): Likewise. (bdver1_ssevector_avx256_unaligned_load): Likewise. (bdver1_ssevector_sse128_unaligned_load): Likewise. (bdver1_ssevector_avx128_load): Likewise. (bdver1_ssevector_avx256_load): Likewise. (bdver1_ssevector_sse128_load): Likewise. (bdver1_ssescalar_movq_load): Likewise. (bdver1_ssescalar_vmovss_load): Likewise. (bdver1_ssescalar_sse128_load): Likewise. (bdver1_mmxsse_load): Likewise. (bdver1_sse_store_avx256): Likewise. (bdver1_sse_store): Likewise. (bdver1_mmxsse_store_short): Likewise. (bdver1_ssevector_avx256): Likewise. (bdver1_movss_movsd): Likewise. (bdver1_mmxssemov): Likewise. (bdver1_sselog_load_256): Likewise. (bdver1_sselog_256): Likewise. (bdver1_sselog_load): Likewise. (bdver1_sselog): Likewise. (bdver1_ssecmp_load): Likewise. (bdver1_ssecmp): Likewise. (bdver1_ssecomi_load): Likewise. (bdver1_ssecomi): Likewise. (bdver1_vcvtX2Y_avx256_load): Likewise. (bdver1_vcvtX2Y_avx256): Likewise. (bdver1_ssecvt_cvtss2sd_load): Likewise. (bdver1_ssecvt_cvtss2sd): Likewise. (bdver1_sseicvt_cvtsi2sd_load): Likewise. (bdver1_sseicvt_cvtsi2sd): Likewise. (bdver1_ssecvt_cvtpd2ps_load): Likewise. (bdver1_ssecvt_cvtpd2ps): Likewise. (bdver1_ssecvt_cvtdq2ps_load): Likewise. (bdver1_ssecvt_cvtdq2ps): Likewise. (bdver1_ssecvt_cvtdq2pd_load): Likewise. (bdver1_ssecvt_cvtdq2pd): Likewise. (bdver1_ssecvt_cvtps2pd_load): Likewise. (bdver1_ssecvt_cvtps2pd): Likewise. (bdver1_ssecvt_cvtsX2si_load): Likewise. (bdver1_ssecvt_cvtsX2si): Likewise. (bdver1_ssecvt_cvtpd2pi_load): Likewise. (bdver1_ssecvt_cvtpd2pi): Likewise. (bdver1_ssecvt_cvtpd2dq_load): Likewise. (bdver1_ssecvt_cvtpd2dq): Likewise. (bdver1_ssecvt_cvtps2pi_load): Likewise. (bdver1_ssecvt_cvtps2pi): Likewise. (bdver1_ssemuladd_load_256): Likewise. (bdver1_ssemuladd_256): Likewise. (bdver1_ssemuladd_load): Likewise. (bdver1_ssemuladd): Likewise. (bdver1_sseimul_load): Likewise. (bdver1_sseimul): Likewise. (bdver1_sseiadd_load): Likewise. (bdver1_sseiadd): Likewise. (bdver1_ssediv_double_load_256): Likewise. (bdver1_ssediv_double_256): Likewise. (bdver1_ssediv_single_load_256): Likewise. (bdver1_ssediv_single_256): Likewise. (bdver1_ssediv_double_load): Likewise. (bdver1_ssediv_double): Likewise. (bdver1_ssediv_single_load): Likewise. (bdver1_ssediv_single): Likewise. (bdver1_sseins): Likewise. * gcc/config/i386/i386.md (include "bdver1.md"): Invoke the pipeline description for bdver1. (x86_sahf_1): Add "bdver1_decode" attribute. (*cmpfp_i_mixed): Likewise. (*cmpfp_i_sse): Likewise. (*cmpfp_i_i387): Likewise. (*cmpfp_iu_mixed): Likewise. (*cmpfp_iu_sse): Likewise. (*cmpfp_iu_387): Likewise. (*swap,*swap_1): Likewise. (fixuns_trunchi2): Likewise. (fix_truncsi_sse): Likewise. (x86_fnstcw_1): Likewise. (x86_fldcw_1): Likewise. (*floatsi2_vector_mixed_with_temp): Likewise. (*floatsi2_vector_mixed): Likewise. (*float2_mixed_with_temp): Likewise. (*float2_mixed_interunit): Likewise. (*float2_mixed_nointerunit): Likewise. (*floatsi2_vector_sse_with_temp): Likewise. (*floatsi2_vector_sse): Likewise. (*float2_sse_with_temp): Likewise. (*float2_sse_interunit): Likewise. (*float2_sse_nointerunit): Likewise. (*mul3_1): Likewise. (*mulsi3_1_zext): Likewise. (*mulhi3_1): Likewise. (*mulqi3_1): Likewise. (*mul3_1): Likewise. (*mulqihi3_1): Likewise. (*muldi3_highpart_1): Likewise. (*mulsi3_highpart_1): Likewise. (*mulsi3_highpart_zext): Likewise. (x86_64_shld): Likewise. (x86_shld): Likewise. (x86_64_shrd): Likewise. (x86_shrd): Likewise. (sqrtxf2): Likewise. (sqrt_extendxf2_i387): Likewise. (*sqrt2_sse): Likewise. * gcc/config/i386/sse.md (sse_cvtsi2ss): Add "bdver1_decode" attribute. (sse_cvtsi2ssq): Likewise. (sse_cvtss2si): Likewise. (sse_cvtss2si_2): Likewise. (sse_cvtss2siq): Likewise. (sse_cvtss2siq_2): Likewise. (sse_cvttss2si): Likewise. (sse_cvttss2siq): Likewise. (sse2_cvtpi2pd): Likewise. (sse2_cvttpd2pi): Likewise. (sse2_cvtsi2sd): Likewise. (sse2_cvtsi2sdq): Likewise. (sse2_cvtsd2si): Likewise. (sse2_cvtsd2si_2): Likewise. (sse2_cvtsd2siq): Likewise. (sse2_cvtsd2siq_2): Likewise. (sse2_cvttsd2si): Likewise. (sse2_cvttsd2siq): Likewise. (*sse2_cvtpd2dq): Likewise. (*sse2_cvttpd2dq): Likewise. (sse2_cvtsd2ss): Likewise. (sse2_cvtss2sd): Likewise. (*sse2_cvtpd2ps): Likewise. (sse2_cvtps2pd): Likewise. --- gcc/config/i386/bdver1.md | 796 +++++++++++++++++++++++++++++++++++++++++++++ gcc/config/i386/i386.md | 84 ++++-- gcc/config/i386/sse.md | 44 ++- 3 files changed, 894 insertions(+), 30 deletions(-) create mode 100644 gcc/config/i386/bdver1.md diff --git a/gcc/config/i386/bdver1.md b/gcc/config/i386/bdver1.md new file mode 100644 index 0000000..3cde476 --- /dev/null +++ b/gcc/config/i386/bdver1.md @@ -0,0 +1,796 @@ +;; Copyright (C) 2010, Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; +;; AMD bdver1 Scheduling +;; +;; The bdver1 contains four pipelined FP units, two integer units and +;; two address generation units. +;; +;; The predecode logic is determining boundaries of instructions in the 64 +;; byte cache line. So the cache line straddling problem of K6 might be issue +;; here as well, but it is not noted in the documentation. +;; +;; Three DirectPath instructions decoders and only one VectorPath decoder +;; is available. They can decode three DirectPath instructions or one +;; VectorPath instruction per cycle. +;; +;; The load/store queue unit is not attached to the schedulers but +;; communicates with all the execution units separately instead. + + +(define_attr "bdver1_decode" "direct,vector,double" + (const_string "direct")) + +(define_automaton "bdver1,bdver1_int,bdver1_load,bdver1_mult,bdver1_fp") + +(define_cpu_unit "bdver1-decode0" "bdver1") +(define_cpu_unit "bdver1-decode1" "bdver1") +(define_cpu_unit "bdver1-decode2" "bdver1") +(define_cpu_unit "bdver1-decodev" "bdver1") + +;; Model the fact that double decoded instruction may take 2 cycles +;; to decode when decoder2 and decoder0 in next cycle +;; is used (this is needed to allow throughput of 1.5 double decoded +;; instructions per cycle). +;; +;; In order to avoid dependence between reservation of decoder +;; and other units, we model decoder as two stage fully pipelined unit +;; and only double decoded instruction may occupy unit in the first cycle. +;; With this scheme however two double instructions can be issued cycle0. +;; +;; Avoid this by using presence set requiring decoder0 to be allocated +;; too. Vector decoded instructions then can't be issued when modeled +;; as consuming decoder0+decoder1+decoder2. +;; We solve that by specialized vector decoder unit and exclusion set. +(presence_set "bdver1-decode2" "bdver1-decode0") +(exclusion_set "bdver1-decodev" "bdver1-decode0,bdver1-decode1,bdver1-decode2") + +(define_reservation "bdver1-vector" "nothing,bdver1-decodev") +(define_reservation "bdver1-direct1" "nothing,bdver1-decode1") +(define_reservation "bdver1-direct" "nothing, + (bdver1-decode0 | bdver1-decode1 + | bdver1-decode2)") +;; Double instructions behaves like two direct instructions. +(define_reservation "bdver1-double" "((bdver1-decode2,bdver1-decode0) + | (nothing,(bdver1-decode0 + bdver1-decode1)) + | (nothing,(bdver1-decode1 + bdver1-decode2)))") + + +(define_cpu_unit "bdver1-ieu0" "bdver1_int") +(define_cpu_unit "bdver1-ieu1" "bdver1_int") +(define_reservation "bdver1-ieu" "(bdver1-ieu0 | bdver1-ieu1)") + +(define_cpu_unit "bdver1-agu0" "bdver1_int") +(define_cpu_unit "bdver1-agu1" "bdver1_int") +(define_reservation "bdver1-agu" "(bdver1-agu0 | bdver1-agu1)") + +(define_cpu_unit "bdver1-mult" "bdver1_mult") + +(define_cpu_unit "bdver1-load0" "bdver1_load") +(define_cpu_unit "bdver1-load1" "bdver1_load") +(define_reservation "bdver1-load" "bdver1-agu, + (bdver1-load0 | bdver1-load1),nothing") +;; 128bit SSE instructions issue two loads at once. +(define_reservation "bdver1-load2" "bdver1-agu, + (bdver1-load0 + bdver1-load1),nothing") + +(define_reservation "bdver1-store" "(bdver1-load0 | bdver1-load1)") +;; 128bit SSE instructions issue two stores at once. +(define_reservation "bdver1-store2" "(bdver1-load0 + bdver1-load1)") + +;; The FP operations start to execute at stage 12 in the pipeline, while +;; integer operations start to execute at stage 9 for athlon and 11 for K8 +;; Compensate the difference for athlon because it results in significantly +;; smaller automata. +;; NOTE: the above information was just copied from athlon.md, and was not +;; actually verified for bdver1. +(define_reservation "bdver1-fpsched" "nothing,nothing,nothing") +;; The floating point loads. +(define_reservation "bdver1-fpload" "(bdver1-fpsched + bdver1-load)") +(define_reservation "bdver1-fpload2" "(bdver1-fpsched + bdver1-load2)") + +;; Four FP units. +(define_cpu_unit "bdver1-ffma0" "bdver1_fp") +(define_cpu_unit "bdver1-ffma1" "bdver1_fp") +(define_cpu_unit "bdver1-fmal0" "bdver1_fp") +(define_cpu_unit "bdver1-fmal1" "bdver1_fp") + +(define_reservation "bdver1-ffma" "(bdver1-ffma0 | bdver1-ffma1)") +(define_reservation "bdver1-fcvt" "bdver1-ffma0") +(define_reservation "bdver1-fmma" "bdver1-ffma0") +(define_reservation "bdver1-fxbar" "bdver1-ffma1") +(define_reservation "bdver1-fmal" "(bdver1-fmal0 | bdver1-fmal1)") +(define_reservation "bdver1-fsto" "bdver1-fmal1") + +;; Vector operations usually consume many of pipes. +(define_reservation "bdver1-fvector" "(bdver1-ffma0 + bdver1-ffma1 + + bdver1-fmal0 + bdver1-fmal1)") + +;; Jump instructions are executed in the branch unit completely transparent to us. +(define_insn_reservation "bdver1_call" 0 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "call,callv")) + "bdver1-double,bdver1-agu,bdver1-ieu") +;; PUSH mem is double path. +(define_insn_reservation "bdver1_push" 1 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "push")) + "bdver1-direct,bdver1-agu,bdver1-store") +;; POP r16/mem are double path. +(define_insn_reservation "bdver1_pop" 1 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "pop")) + "bdver1-direct,(bdver1-ieu+bdver1-load)") +;; LEAVE no latency info so far, assume same with amdfam10. +(define_insn_reservation "bdver1_leave" 3 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "leave")) + "bdver1-vector,(bdver1-ieu+bdver1-load)") +;; LEA executes in AGU unit with 1 cycle latency on BDVER1. +(define_insn_reservation "bdver1_lea" 1 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "lea")) + "bdver1-direct,bdver1-agu,nothing") + +;; MUL executes in special multiplier unit attached to IEU1. +(define_insn_reservation "bdver1_imul_DI" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "none,unknown")))) + "bdver1-direct1,bdver1-ieu1,bdver1-mult,nothing,bdver1-ieu1") +(define_insn_reservation "bdver1_imul" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imul") + (eq_attr "memory" "none,unknown"))) + "bdver1-direct1,bdver1-ieu1,bdver1-mult,bdver1-ieu1") +(define_insn_reservation "bdver1_imul_mem_DI" 10 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "load,both")))) + "bdver1-direct1,bdver1-load,bdver1-ieu,bdver1-mult,nothing,bdver1-ieu") +(define_insn_reservation "bdver1_imul_mem" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imul") + (eq_attr "memory" "load,both"))) + "bdver1-direct1,bdver1-load,bdver1-ieu,bdver1-mult,bdver1-ieu") + +;; IDIV cannot execute in parallel with other instructions. Dealing with it +;; as with short latency vector instruction is good approximation avoiding +;; scheduler from trying too hard to can hide it's latency by overlap with +;; other instructions. +;; ??? Experiments show that the IDIV can overlap with roughly 6 cycles +;; of the other code. +(define_insn_reservation "bdver1_idiv" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "idiv") + (eq_attr "memory" "none,unknown"))) + "bdver1-vector,(bdver1-ieu0*6+(bdver1-fpsched,bdver1-fvector))") + +(define_insn_reservation "bdver1_idiv_mem" 10 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "idiv") + (eq_attr "memory" "load,both"))) + "bdver1-vector,((bdver1-load,bdver1-ieu0*6)+(bdver1-fpsched,bdver1-fvector))") + +;; The parallelism of string instructions is not documented. Model it same way +;; as IDIV to create smaller automata. This probably does not matter much. +;; Using the same heuristics for bdver1 as amdfam10 and K8 with IDIV. +(define_insn_reservation "bdver1_str" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "str") + (eq_attr "memory" "load,both,store"))) + "bdver1-vector,bdver1-load,bdver1-ieu0*6") + +;; Integer instructions. +(define_insn_reservation "bdver1_idirect" 1 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "none,unknown")))) + "bdver1-direct,bdver1-ieu") +(define_insn_reservation "bdver1_ivector" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "none,unknown")))) + "bdver1-vector,bdver1-ieu,bdver1-ieu") +(define_insn_reservation "bdver1_idirect_loadmov" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imov") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-load") +(define_insn_reservation "bdver1_idirect_load" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-load,bdver1-ieu") +(define_insn_reservation "bdver1_ivector_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "load")))) + "bdver1-vector,bdver1-load,bdver1-ieu,bdver1-ieu") +(define_insn_reservation "bdver1_idirect_movstore" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "imov") + (eq_attr "memory" "store"))) + "bdver1-direct,bdver1-agu,bdver1-store") +(define_insn_reservation "bdver1_idirect_both" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "both")))) + "bdver1-direct,bdver1-load, + bdver1-ieu,bdver1-store, + bdver1-store") +(define_insn_reservation "bdver1_ivector_both" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "both")))) + "bdver1-vector,bdver1-load, + bdver1-ieu, + bdver1-ieu, + bdver1-store") +(define_insn_reservation "bdver1_idirect_store" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "store")))) + "bdver1-direct,(bdver1-ieu+bdver1-agu), + bdver1-store") +(define_insn_reservation "bdver1_ivector_store" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "store")))) + "bdver1-vector,(bdver1-ieu+bdver1-agu),bdver1-ieu, + bdver1-store") + +;; BDVER1 floating point units. +(define_insn_reservation "bdver1_fldxf" 13 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fmov") + (and (eq_attr "memory" "load") + (eq_attr "mode" "XF")))) + "bdver1-vector,bdver1-fpload2,bdver1-fvector*9") +(define_insn_reservation "bdver1_fld" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_fstxf" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fmov") + (and (eq_attr "memory" "store,both") + (eq_attr "mode" "XF")))) + "bdver1-vector,(bdver1-fpsched+bdver1-agu),(bdver1-store2+(bdver1-fvector*6))") +(define_insn_reservation "bdver1_fst" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fmov") + (eq_attr "memory" "store,both"))) + "bdver1-double,(bdver1-fpsched+bdver1-agu),(bdver1-fsto+bdver1-store)") +(define_insn_reservation "bdver1_fist" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fistp,fisttp")) + "bdver1-double,(bdver1-fpsched+bdver1-agu),(bdver1-fsto+bdver1-store)") +(define_insn_reservation "bdver1_fmov_bdver1" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fmov")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fadd_load" 10 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fop") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_fadd" 6 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fop")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fmul_load" 10 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fmul") + (eq_attr "memory" "load"))) + "bdver1-double,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_fmul" 6 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fmul")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fsgn" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fsgn")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fdiv_load" 46 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fdiv") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_fdiv" 42 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fdiv")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fpspc_load" 103 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "load"))) + "bdver1-vector,bdver1-fpload,bdver1-fvector") +(define_insn_reservation "bdver1_fpspc" 100 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fpspc") + (eq_attr "memory" "load"))) + "bdver1-vector,bdver1-fpload,bdver1-fvector") +(define_insn_reservation "bdver1_fcmov_load" 17 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fcmov") + (eq_attr "memory" "load"))) + "bdver1-vector,bdver1-fpload,bdver1-fvector") +(define_insn_reservation "bdver1_fcmov" 15 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fcmov")) + "bdver1-vector,bdver1-fpsched,bdver1-fvector") +(define_insn_reservation "bdver1_fcomi_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fcmp") + (and (eq_attr "bdver1_decode" "double") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,(bdver1-ffma | bdver1-fsto)") +(define_insn_reservation "bdver1_fcomi" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "bdver1_decode" "double") + (eq_attr "type" "fcmp"))) + "bdver1-double,bdver1-fpsched,(bdver1-ffma | bdver1-fsto)") +(define_insn_reservation "bdver1_fcom_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "fcmp") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_fcom" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fcmp")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_fxch" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "fxch")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") + +;; SSE loads. +(define_insn_reservation "bdver1_ssevector_avx128_unaligned_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "prefix" "vex") + (and (eq_attr "movu" "1") + (and (eq_attr "mode" "V4SF,V2DF") + (eq_attr "memory" "load")))))) + "bdver1-direct,bdver1-fpload") +(define_insn_reservation "bdver1_ssevector_avx256_unaligned_load" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "movu" "1") + (and (eq_attr "mode" "V8SF,V4DF") + (eq_attr "memory" "load"))))) + "bdver1-double,bdver1-fpload") +(define_insn_reservation "bdver1_ssevector_sse128_unaligned_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "movu" "1") + (and (eq_attr "mode" "V4SF,V2DF") + (eq_attr "memory" "load"))))) + "bdver1-direct,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_ssevector_avx128_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "prefix" "vex") + (and (eq_attr "mode" "V4SF,V2DF,TI") + (eq_attr "memory" "load"))))) + "bdver1-direct,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_ssevector_avx256_load" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_ssevector_sse128_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V4SF,V2DF,TI") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload") +(define_insn_reservation "bdver1_ssescalar_movq_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "DI") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_ssescalar_vmovss_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "prefix" "vex") + (and (eq_attr "mode" "SF") + (eq_attr "memory" "load"))))) + "bdver1-direct,bdver1-fpload") +(define_insn_reservation "bdver1_ssescalar_sse128_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload, bdver1-ffma") +(define_insn_reservation "bdver1_mmxsse_load" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "mmxmov,ssemov") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload, bdver1-fmal") + +;; SSE stores. +(define_insn_reservation "bdver1_sse_store_avx256" 5 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "store,both")))) + "bdver1-double,(bdver1-fpsched+bdver1-agu),((bdver1-fsto+bdver1-store)*2)") +(define_insn_reservation "bdver1_sse_store" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V4SF,V2DF,TI") + (eq_attr "memory" "store,both")))) + "bdver1-direct,(bdver1-fpsched+bdver1-agu),((bdver1-fsto+bdver1-store)*2)") +(define_insn_reservation "bdver1_mmxsse_store_short" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "mmxmov,ssemov") + (eq_attr "memory" "store,both"))) + "bdver1-direct,(bdver1-fpsched+bdver1-agu),(bdver1-fsto+bdver1-store)") + +;; Register moves. +(define_insn_reservation "bdver1_ssevector_avx256" 3 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V8SF,V4DF,OI") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,bdver1-fmal") +(define_insn_reservation "bdver1_movss_movsd" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "none")))) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_mmxssemov" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "mmxmov,ssemov") + (eq_attr "memory" "none"))) + "bdver1-direct,bdver1-fpsched,bdver1-fmal") +;; SSE logs. +(define_insn_reservation "bdver1_sselog_load_256" 7 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sselog,sselog1") + (and (eq_attr "mode" "V8SF") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_sselog_256" 3 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sselog,sselog1") + (eq_attr "mode" "V8SF"))) + "bdver1-double,bdver1-fpsched,bdver1-fmal") +(define_insn_reservation "bdver1_sselog_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sselog,sselog1") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-fxbar") +(define_insn_reservation "bdver1_sselog" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "sselog,sselog1")) + "bdver1-direct,bdver1-fpsched,bdver1-fxbar") + +;; PCMP actually executes in FMAL. +(define_insn_reservation "bdver1_ssecmp_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecmp") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_ssecmp" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "ssecmp")) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_ssecomi_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecomi") + (eq_attr "memory" "load"))) + "bdver1-double,bdver1-fpload,(bdver1-ffma | bdver1-fsto)") +(define_insn_reservation "bdver1_ssecomi" 2 + (and (eq_attr "cpu" "bdver1") + (eq_attr "type" "ssecomi")) + "bdver1-double,bdver1-fpsched,(bdver1-ffma | bdver1-fsto)") + +;; Conversions behaves very irregularly and the scheduling is critical here. +;; Take each instruction separately. + +;; 256 bit conversion. +(define_insn_reservation "bdver1_vcvtX2Y_avx256_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (ior (ior (match_operand:V4DF 0 "register_operand") + (ior (match_operand:V8SF 0 "register_operand") + (match_operand:V8SI 0 "register_operand"))) + (ior (match_operand:V4DF 1 "nonimmediate_operand") + (ior (match_operand:V8SF 1 "nonimmediate_operand") + (match_operand:V8SI 1 "nonimmediate_operand"))))))) + "bdver1-vector,bdver1-fpload,bdver1-fvector") +(define_insn_reservation "bdver1_vcvtX2Y_avx256" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (ior (ior (match_operand:V4DF 0 "register_operand") + (ior (match_operand:V8SF 0 "register_operand") + (match_operand:V8SI 0 "register_operand"))) + (ior (match_operand:V4DF 1 "nonimmediate_operand") + (ior (match_operand:V8SF 1 "nonimmediate_operand") + (match_operand:V8SI 1 "nonimmediate_operand"))))))) + "bdver1-vector,bdver1-fpsched,bdver1-fvector") +;; CVTSS2SD, CVTSD2SS. +(define_insn_reservation "bdver1_ssecvt_cvtss2sd_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload,bdver1-fcvt") +(define_insn_reservation "bdver1_ssecvt_cvtss2sd" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "none")))) + "bdver1-direct,bdver1-fpsched,bdver1-fcvt") +;; CVTSI2SD, CVTSI2SS, CVTSI2SDQ, CVTSI2SSQ. +(define_insn_reservation "bdver1_sseicvt_cvtsi2sd_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseicvt") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload,bdver1-fcvt") +(define_insn_reservation "bdver1_sseicvt_cvtsi2sd" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseicvt") + (and (eq_attr "mode" "SF,DF") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,(nothing | bdver1-fcvt)") +;; CVTPD2PS. +(define_insn_reservation "bdver1_ssecvt_cvtpd2ps_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V4SF 0 "register_operand") + (match_operand:V2DF 1 "nonimmediate_operand"))))) + "bdver1-double,bdver1-fpload,(bdver1-fxbar | bdver1-fcvt)") +(define_insn_reservation "bdver1_ssecvt_cvtpd2ps" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V4SF 0 "register_operand") + (match_operand:V2DF 1 "nonimmediate_operand"))))) + "bdver1-double,bdver1-fpsched,(bdver1-fxbar | bdver1-fcvt)") +;; CVTPI2PS, CVTDQ2PS. +(define_insn_reservation "bdver1_ssecvt_cvtdq2ps_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V4SF 0 "register_operand") + (ior (match_operand:V2SI 1 "nonimmediate_operand") + (match_operand:V4SI 1 "nonimmediate_operand")))))) + "bdver1-direct,bdver1-fpload,bdver1-fcvt") +(define_insn_reservation "bdver1_ssecvt_cvtdq2ps" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V4SF 0 "register_operand") + (ior (match_operand:V2SI 1 "nonimmediate_operand") + (match_operand:V4SI 1 "nonimmediate_operand")))))) + "bdver1-direct,bdver1-fpsched,bdver1-fcvt") +;; CVTDQ2PD. +(define_insn_reservation "bdver1_ssecvt_cvtdq2pd_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V2DF 0 "register_operand") + (match_operand:V4SI 1 "nonimmediate_operand"))))) + "bdver1-double,bdver1-fpload,(bdver1-fxbar | bdver1-fcvt)") +(define_insn_reservation "bdver1_ssecvt_cvtdq2pd" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V2DF 0 "register_operand") + (match_operand:V4SI 1 "nonimmediate_operand"))))) + "bdver1-double,bdver1-fpsched,(bdver1-fxbar | bdver1-fcvt)") +;; CVTPS2PD, CVTPI2PD. +(define_insn_reservation "bdver1_ssecvt_cvtps2pd_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V2DF 0 "register_operand") + (ior (match_operand:V2SI 1 "nonimmediate_operand") + (match_operand:V4SF 1 "nonimmediate_operand")))))) + "bdver1-double,bdver1-fpload,(bdver1-fxbar | bdver1-fcvt)") +(define_insn_reservation "bdver1_ssecvt_cvtps2pd" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V2DF 0 "register_operand") + (ior (match_operand:V2SI 1 "nonimmediate_operand") + (match_operand:V4SF 1 "nonimmediate_operand")))))) + "bdver1-double,bdver1-fpsched,(bdver1-fxbar | bdver1-fcvt)") +;; CVTSD2SI, CVTSD2SIQ, CVTSS2SI, CVTSS2SIQ, CVTTSD2SI, CVTTSD2SIQ, CVTTSS2SI, CVTTSS2SIQ. +(define_insn_reservation "bdver1_ssecvt_cvtsX2si_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseicvt") + (and (eq_attr "mode" "SI,DI") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,(bdver1-fcvt | bdver1-fsto)") +(define_insn_reservation "bdver1_ssecvt_cvtsX2si" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseicvt") + (and (eq_attr "mode" "SI,DI") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,(bdver1-fcvt | bdver1-fsto)") +;; CVTPD2PI, CVTTPD2PI. +(define_insn_reservation "bdver1_ssecvt_cvtpd2pi_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V2DF 1 "nonimmediate_operand") + (match_operand:V2SI 0 "register_operand"))))) + "bdver1-double,bdver1-fpload,(bdver1-fcvt | bdver1-fxbar)") +(define_insn_reservation "bdver1_ssecvt_cvtpd2pi" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V2DF 1 "nonimmediate_operand") + (match_operand:V2SI 0 "register_operand"))))) + "bdver1-double,bdver1-fpsched,(bdver1-fcvt | bdver1-fxbar)") +;; CVTPD2DQ, CVTTPD2DQ. +(define_insn_reservation "bdver1_ssecvt_cvtpd2dq_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V2DF 1 "nonimmediate_operand") + (match_operand:V4SI 0 "register_operand"))))) + "bdver1-double,bdver1-fpload,(bdver1-fcvt | bdver1-fxbar)") +(define_insn_reservation "bdver1_ssecvt_cvtpd2dq" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V2DF 1 "nonimmediate_operand") + (match_operand:V4SI 0 "register_operand"))))) + "bdver1-double,bdver1-fpsched,(bdver1-fcvt | bdver1-fxbar)") +;; CVTPS2PI, CVTTPS2PI, CVTPS2DQ, CVTTPS2DQ. +(define_insn_reservation "bdver1_ssecvt_cvtps2pi_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "load") + (and (match_operand:V4SF 1 "nonimmediate_operand") + (ior (match_operand: V2SI 0 "register_operand") + (match_operand: V4SI 0 "register_operand")))))) + "bdver1-direct,bdver1-fpload,bdver1-fcvt") +(define_insn_reservation "bdver1_ssecvt_cvtps2pi" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssecvt") + (and (eq_attr "memory" "none") + (and (match_operand:V4SF 1 "nonimmediate_operand") + (ior (match_operand: V2SI 0 "register_operand") + (match_operand: V4SI 0 "register_operand")))))) + "bdver1-direct,bdver1-fpsched,bdver1-fcvt") + +;; SSE MUL, ADD, and MULADD. +(define_insn_reservation "bdver1_ssemuladd_load_256" 11 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "mode" "V8SF,V4DF") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_ssemuladd_256" 7 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "mode" "V8SF,V4DF") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_ssemuladd_load" 10 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-ffma") +(define_insn_reservation "bdver1_ssemuladd" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (eq_attr "memory" "none"))) + "bdver1-direct,bdver1-fpsched,bdver1-ffma") +(define_insn_reservation "bdver1_sseimul_load" 8 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseimul") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-fmma") +(define_insn_reservation "bdver1_sseimul" 4 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseimul") + (eq_attr "memory" "none"))) + "bdver1-direct,bdver1-fpsched,bdver1-fmma") +(define_insn_reservation "bdver1_sseiadd_load" 6 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseiadd") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_sseiadd" 2 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseiadd") + (eq_attr "memory" "none"))) + "bdver1-direct,bdver1-fpsched,bdver1-fmal") + +;; SSE DIV: no throughput information (assume same as amdfam10). +(define_insn_reservation "bdver1_ssediv_double_load_256" 31 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_double_256" 27 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V4DF") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_single_load_256" 28 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_single_256" 24 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "V8SF") + (eq_attr "memory" "none")))) + "bdver1-double,bdver1-fpsched,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_double_load" 31 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "DF,V2DF") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_double" 27 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "DF,V2DF") + (eq_attr "memory" "none")))) + "bdver1-direct,bdver1-fpsched,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_single_load" 28 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "SF,V4SF") + (eq_attr "memory" "load")))) + "bdver1-direct,bdver1-fpload,(bdver1-ffma0*17 | bdver1-ffma1*17)") +(define_insn_reservation "bdver1_ssediv_single" 24 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "SF,V4SF") + (eq_attr "memory" "none")))) + "bdver1-direct,bdver1-fpsched,(bdver1-ffma0*17 | bdver1-ffma1*17)") + +(define_insn_reservation "bdver1_sseins" 3 + (and (eq_attr "cpu" "bdver1") + (and (eq_attr "type" "sseins") + (eq_attr "mode" "TI"))) + "bdver1-direct,bdver1-fpsched,bdver1-fxbar") + diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d97e96f..475e530 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -928,6 +928,7 @@ (include "ppro.md") (include "k6.md") (include "athlon.md") +(include "bdver1.md") (include "geode.md") (include "atom.md") @@ -1456,6 +1457,7 @@ [(set_attr "length" "1") (set_attr "athlon_decode" "vector") (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "SI")]) ;; Pentium Pro can do steps 1 through 3 in one go. @@ -1486,7 +1488,8 @@ ] (const_string "0"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_insn "*cmpfp_i_sse" [(set (reg:CCFP FLAGS_REG) @@ -1508,7 +1511,8 @@ (const_string "1") (const_string "0"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_insn "*cmpfp_i_i387" [(set (reg:CCFP FLAGS_REG) @@ -1528,7 +1532,8 @@ ] (const_string "XF"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_insn "*cmpfp_iu_mixed" [(set (reg:CCFPU FLAGS_REG) @@ -1556,7 +1561,8 @@ ] (const_string "0"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_insn "*cmpfp_iu_sse" [(set (reg:CCFPU FLAGS_REG) @@ -1578,7 +1584,8 @@ (const_string "1") (const_string "0"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_insn "*cmpfp_iu_387" [(set (reg:CCFPU FLAGS_REG) @@ -1598,7 +1605,8 @@ ] (const_string "XF"))) (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct")]) ;; Push/pop instructions. @@ -2352,7 +2360,8 @@ (set_attr "mode" "") (set_attr "pent_pair" "np") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "double")]) + (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "double")]) (define_insn "*swap_1" [(set (match_operand:SWI12 0 "register_operand" "+r") @@ -2365,7 +2374,8 @@ (set_attr "mode" "SI") (set_attr "pent_pair" "np") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "double")]) + (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "double")]) ;; Not added amdfam10_decode since TARGET_PARTIAL_REG_STALL ;; is disabled for AMDFAM10 @@ -4560,7 +4570,8 @@ (set_attr "prefix_rex" "1") (set_attr "mode" "") (set_attr "athlon_decode" "double,vector") - (set_attr "amdfam10_decode" "double,double")]) + (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double")]) (define_insn "fix_truncsi_sse" [(set (match_operand:SI 0 "register_operand" "=r,r") @@ -4572,7 +4583,8 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "") (set_attr "athlon_decode" "double,vector") - (set_attr "amdfam10_decode" "double,double")]) + (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double")]) ;; Shorten x87->SSE reload sequences of fix_trunc?f?i_sse patterns. (define_peephole2 @@ -4827,7 +4839,8 @@ [(set (attr "length") (symbol_ref "ix86_attr_length_address_default (insn) + 2")) (set_attr "mode" "HI") - (set_attr "unit" "i387")]) + (set_attr "unit" "i387") + (set_attr "bdver1_decode" "vector")]) (define_insn "x86_fldcw_1" [(set (reg:HI FPCR_REG) @@ -4839,7 +4852,8 @@ (set_attr "mode" "HI") (set_attr "unit" "i387") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "vector")]) + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) ;; Conversion between fixed point and floating point. @@ -4993,6 +5007,7 @@ (set_attr "unit" "*,i387,*,*,*") (set_attr "athlon_decode" "*,*,double,direct,double") (set_attr "amdfam10_decode" "*,*,vector,double,double") + (set_attr "bdver1_decode" "*,*,double,direct,double") (set_attr "fp_int_src" "true")]) (define_insn "*floatsi2_vector_mixed" @@ -5008,6 +5023,7 @@ (set_attr "unit" "i387,*") (set_attr "athlon_decode" "*,direct") (set_attr "amdfam10_decode" "*,double") + (set_attr "bdver1_decode" "*,direct") (set_attr "fp_int_src" "true")]) (define_insn "*float2_mixed_with_temp" @@ -5023,6 +5039,7 @@ (set_attr "unit" "*,i387,*,*") (set_attr "athlon_decode" "*,*,double,direct") (set_attr "amdfam10_decode" "*,*,vector,double") + (set_attr "bdver1_decode" "*,*,double,direct") (set_attr "fp_int_src" "true")]) (define_split @@ -5075,6 +5092,7 @@ (set_attr "unit" "i387,*,*") (set_attr "athlon_decode" "*,double,direct") (set_attr "amdfam10_decode" "*,vector,double") + (set_attr "bdver1_decode" "*,double,direct") (set_attr "fp_int_src" "true")]) (define_insn "*float2_mixed_nointerunit" @@ -5098,6 +5116,7 @@ (const_string "*"))) (set_attr "athlon_decode" "*,direct") (set_attr "amdfam10_decode" "*,double") + (set_attr "bdver1_decode" "*,direct") (set_attr "fp_int_src" "true")]) (define_insn "*floatsi2_vector_sse_with_temp" @@ -5112,6 +5131,7 @@ (set_attr "mode" ",,") (set_attr "athlon_decode" "double,direct,double") (set_attr "amdfam10_decode" "vector,double,double") + (set_attr "bdver1_decode" "double,direct,double") (set_attr "fp_int_src" "true")]) (define_insn "*floatsi2_vector_sse" @@ -5124,6 +5144,7 @@ (set_attr "mode" "") (set_attr "athlon_decode" "direct") (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "fp_int_src" "true")]) (define_split @@ -5259,6 +5280,7 @@ (set_attr "mode" "") (set_attr "athlon_decode" "double,direct") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct") (set_attr "fp_int_src" "true")]) (define_insn "*float2_sse_interunit" @@ -5280,6 +5302,7 @@ (const_string "*"))) (set_attr "athlon_decode" "double,direct") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct") (set_attr "fp_int_src" "true")]) (define_split @@ -5314,6 +5337,7 @@ (const_string "*"))) (set_attr "athlon_decode" "direct") (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "fp_int_src" "true")]) (define_split @@ -6851,6 +6875,8 @@ ;; IMUL reg32/64, mem32/64, imm32 VectorPath ;; IMUL reg32/64, reg32/64 Direct ;; IMUL reg32/64, mem32/64 Direct +;; +;; On BDVER1, all above IMULs use DirectPath (define_insn "*mul3_1" [(set (match_operand:SWI48 0 "register_operand" "=r,r,r") @@ -6879,6 +6905,7 @@ (match_operand 1 "memory_operand" "")) (const_string "vector")] (const_string "direct"))) + (set_attr "bdver1_decode" "direct") (set_attr "mode" "")]) (define_insn "*mulsi3_1_zext" @@ -6909,6 +6936,7 @@ (match_operand 1 "memory_operand" "")) (const_string "vector")] (const_string "direct"))) + (set_attr "bdver1_decode" "direct") (set_attr "mode" "SI")]) ;; On AMDFAM10 @@ -6918,6 +6946,8 @@ ;; IMUL reg16, mem16, imm16 VectorPath ;; IMUL reg16, reg16 Direct ;; IMUL reg16, mem16 Direct +;; +;; On BDVER1, all HI MULs use DoublePath (define_insn "*mulhi3_1" [(set (match_operand:HI 0 "register_operand" "=r,r,r") @@ -6942,9 +6972,10 @@ (cond [(eq_attr "alternative" "0,1") (const_string "vector")] (const_string "direct"))) + (set_attr "bdver1_decode" "double") (set_attr "mode" "HI")]) -;;On AMDFAM10 +;;On AMDFAM10 and BDVER1 ;; MUL reg8 Direct ;; MUL mem8 Direct @@ -6963,6 +6994,7 @@ (const_string "vector") (const_string "direct"))) (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "QI")]) (define_expand "mul3" @@ -7001,6 +7033,7 @@ (const_string "vector") (const_string "double"))) (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "")]) (define_insn "*mulqihi3_1" @@ -7021,6 +7054,7 @@ (const_string "vector") (const_string "direct"))) (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "QI")]) (define_expand "mul3_highpart" @@ -7060,6 +7094,7 @@ (const_string "vector") (const_string "double"))) (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "DI")]) (define_insn "*mulsi3_highpart_1" @@ -7083,6 +7118,7 @@ (const_string "vector") (const_string "double"))) (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "SI")]) (define_insn "*mulsi3_highpart_zext" @@ -7106,6 +7142,7 @@ (const_string "vector") (const_string "double"))) (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "direct") (set_attr "mode" "SI")]) ;; The patterns that match these are at the end of this file. @@ -9094,7 +9131,8 @@ (set_attr "prefix_0f" "1") (set_attr "mode" "DI") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "vector")]) + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) (define_insn "x86_shld" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") @@ -9110,7 +9148,8 @@ (set_attr "mode" "SI") (set_attr "pent_pair" "np") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "vector")]) + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) (define_expand "x86_shift_adj_1" [(set (reg:CCZ FLAGS_REG) @@ -9791,7 +9830,8 @@ (set_attr "prefix_0f" "1") (set_attr "mode" "DI") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "vector")]) + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) (define_insn "x86_shrd" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") @@ -9807,7 +9847,8 @@ (set_attr "mode" "SI") (set_attr "pent_pair" "np") (set_attr "athlon_decode" "vector") - (set_attr "amdfam10_decode" "vector")]) + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) (define_insn "ashrdi3_cvt" [(set (match_operand:DI 0 "nonimmediate_operand" "=*d,rm") @@ -12931,7 +12972,8 @@ [(set_attr "type" "fpspc") (set_attr "mode" "XF") (set_attr "athlon_decode" "direct") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct")]) (define_insn "sqrt_extendxf2_i387" [(set (match_operand:XF 0 "register_operand" "=f") @@ -12943,7 +12985,8 @@ [(set_attr "type" "fpspc") (set_attr "mode" "XF") (set_attr "athlon_decode" "direct") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "direct")]) (define_insn "*rsqrtsf2_sse" [(set (match_operand:SF 0 "register_operand" "=x") @@ -12977,7 +13020,8 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "") (set_attr "athlon_decode" "*") - (set_attr "amdfam10_decode" "*")]) + (set_attr "amdfam10_decode" "*") + (set_attr "bdver1_decode" "*")]) (define_expand "sqrt2" [(set (match_operand:MODEF 0 "register_operand" "") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 635a460..4fe7e5c 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2226,6 +2226,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "vector,double") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct") (set_attr "mode" "SF")]) (define_insn "*avx_cvtsi2ssq" @@ -2255,6 +2256,7 @@ (set_attr "prefix_rex" "1") (set_attr "athlon_decode" "vector,double") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct") (set_attr "mode" "SF")]) (define_insn "sse_cvtss2si" @@ -2268,6 +2270,7 @@ "%vcvtss2si\t{%1, %0|%0, %1}" [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI")]) @@ -2281,6 +2284,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI")]) @@ -2296,6 +2300,7 @@ "%vcvtss2si{q}\t{%1, %0|%0, %1}" [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) @@ -2309,6 +2314,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) @@ -2324,6 +2330,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI")]) @@ -2339,6 +2346,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) @@ -2453,7 +2461,8 @@ [(set_attr "type" "ssecvt") (set_attr "unit" "mmx") (set_attr "prefix_data16" "1") - (set_attr "mode" "DI")]) + (set_attr "mode" "DI") + (set_attr "bdver1_decode" "double")]) (define_insn "sse2_cvttpd2pi" [(set (match_operand:V2SI 0 "register_operand" "=y") @@ -2463,7 +2472,8 @@ [(set_attr "type" "ssecvt") (set_attr "unit" "mmx") (set_attr "prefix_data16" "1") - (set_attr "mode" "TI")]) + (set_attr "mode" "TI") + (set_attr "bdver1_decode" "double")]) (define_insn "*avx_cvtsi2sd" [(set (match_operand:V2DF 0 "register_operand" "=x") @@ -2490,7 +2500,8 @@ [(set_attr "type" "sseicvt") (set_attr "mode" "DF") (set_attr "athlon_decode" "double,direct") - (set_attr "amdfam10_decode" "vector,double")]) + (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct")]) (define_insn "*avx_cvtsi2sdq" [(set (match_operand:V2DF 0 "register_operand" "=x") @@ -2519,7 +2530,8 @@ (set_attr "prefix_rex" "1") (set_attr "mode" "DF") (set_attr "athlon_decode" "double,direct") - (set_attr "amdfam10_decode" "vector,double")]) + (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "double,direct")]) (define_insn "sse2_cvtsd2si" [(set (match_operand:SI 0 "register_operand" "=r,r") @@ -2532,6 +2544,7 @@ "%vcvtsd2si\t{%1, %0|%0, %1}" [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI")]) @@ -2545,6 +2558,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI")]) @@ -2560,6 +2574,7 @@ "%vcvtsd2siq\t{%1, %0|%0, %1}" [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) @@ -2573,6 +2588,7 @@ [(set_attr "type" "sseicvt") (set_attr "athlon_decode" "double,vector") (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double") (set_attr "prefix_rep" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) @@ -2590,7 +2606,8 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "SI") (set_attr "athlon_decode" "double,vector") - (set_attr "amdfam10_decode" "double,double")]) + (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double")]) (define_insn "sse2_cvttsd2siq" [(set (match_operand:DI 0 "register_operand" "=r,r") @@ -2605,7 +2622,8 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI") (set_attr "athlon_decode" "double,vector") - (set_attr "amdfam10_decode" "double,double")]) + (set_attr "amdfam10_decode" "double,double") + (set_attr "bdver1_decode" "double,double")]) (define_insn "avx_cvtdq2pd256" [(set (match_operand:V4DF 0 "register_operand" "=x") @@ -2673,7 +2691,8 @@ (set_attr "prefix_data16" "0") (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI") - (set_attr "amdfam10_decode" "double")]) + (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "double")]) (define_insn "avx_cvttpd2dq256" [(set (match_operand:V4SI 0 "register_operand" "=x") @@ -2703,7 +2722,8 @@ [(set_attr "type" "ssecvt") (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI") - (set_attr "amdfam10_decode" "double")]) + (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "double")]) (define_insn "*avx_cvtsd2ss" [(set (match_operand:V4SF 0 "register_operand" "=x") @@ -2732,6 +2752,7 @@ [(set_attr "type" "ssecvt") (set_attr "athlon_decode" "vector,double") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "direct,direct") (set_attr "mode" "SF")]) (define_insn "*avx_cvtss2sd" @@ -2762,6 +2783,7 @@ "cvtss2sd\t{%2, %0|%0, %2}" [(set_attr "type" "ssecvt") (set_attr "amdfam10_decode" "vector,double") + (set_attr "bdver1_decode" "direct,direct") (set_attr "mode" "DF")]) (define_insn "avx_cvtpd2ps256" @@ -2796,7 +2818,8 @@ (set_attr "prefix_data16" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "V4SF") - (set_attr "amdfam10_decode" "double")]) + (set_attr "amdfam10_decode" "double") + (set_attr "bdver1_decode" "double")]) (define_insn "avx_cvtps2pd256" [(set (match_operand:V4DF 0 "register_operand" "=x") @@ -2832,7 +2855,8 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "V2DF") (set_attr "prefix_data16" "0") - (set_attr "amdfam10_decode" "direct")]) + (set_attr "amdfam10_decode" "direct") + (set_attr "bdver1_decode" "double")]) (define_expand "vec_unpacks_hi_v4sf" [(set (match_dup 2) -- 1.6.3.3