From patchwork Mon Dec 5 10:44:32 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 129260 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 9E60F1007D4 for ; Mon, 5 Dec 2011 21:44:53 +1100 (EST) Received: (qmail 21673 invoked by alias); 5 Dec 2011 10:44:50 -0000 Received: (qmail 21657 invoked by uid 22791); 5 Dec 2011 10:44:48 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-pz0-f47.google.com (HELO mail-pz0-f47.google.com) (209.85.210.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 05 Dec 2011 10:44:34 +0000 Received: by dake40 with SMTP id e40so4370084dak.20 for ; Mon, 05 Dec 2011 02:44:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.199.6 with SMTP id jg6mr22455265pbc.26.1323081872770; Mon, 05 Dec 2011 02:44:32 -0800 (PST) Received: by 10.68.64.138 with HTTP; Mon, 5 Dec 2011 02:44:32 -0800 (PST) In-Reply-To: References: Date: Mon, 5 Dec 2011 10:44:32 +0000 Message-ID: Subject: [Patch ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to floating point conversions better. From: Ramana Radhakrishnan To: gcc-patches Cc: Patch Tracking X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org The original RFC is here - http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01961.html > >        * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define. >        * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare. >        * config/arm/constraints.md ("Dt"): New constraint. >        * config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal): >        New. >        * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New. >        (*arm_combine_vcvt_f32_u32): New. After testing this recently and having received no other feedback on the RFC, I've now committed the attached patch. Ramana 2011-12-05 Ramana Radhakrishnan * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define. * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare. * config/arm/constraints.md ("Dt"): New constraint. * config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal): New. * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New. (*arm_combine_vcvt_f32_u32): New. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 182004) +++ gcc/config/arm/arm.c (working copy) @@ -17671,6 +17671,11 @@ } return; + case 'v': + gcc_assert (GET_CODE (x) == CONST_DOUBLE); + fprintf (stream, "#%d", vfp3_const_double_for_fract_bits (x)); + return; + /* Register specifier for vld1.16/vst1.16. Translate the S register number into a D register number and element index. */ case 'z': @@ -25038,4 +25043,27 @@ return count; } +int +vfp3_const_double_for_fract_bits (rtx operand) +{ + REAL_VALUE_TYPE r0; + + if (GET_CODE (operand) != CONST_DOUBLE) + return 0; + + REAL_VALUE_FROM_CONST_DOUBLE (r0, operand); + if (exact_real_inverse (DFmode, &r0)) + { + if (exact_real_truncate (DFmode, &r0)) + { + HOST_WIDE_INT value = real_to_integer (&r0); + value = value & 0xffffffff; + if ((value != 0) && ( (value & (value - 1)) == 0)) + return int_log2 (value); + } + } + return 0; +} + #include "gt-arm.h" + Index: gcc/config/arm/arm-protos.h =================================================================== --- gcc/config/arm/arm-protos.h (revision 182004) +++ gcc/config/arm/arm-protos.h (working copy) @@ -241,6 +241,7 @@ }; extern const struct tune_params *current_tune; +extern int vfp3_const_double_for_fract_bits (rtx); #endif /* RTX_CODE */ #endif /* ! GCC_ARM_PROTOS_H */ Index: gcc/config/arm/vfp.md =================================================================== --- gcc/config/arm/vfp.md (revision 182004) +++ gcc/config/arm/vfp.md (working copy) @@ -1144,9 +1144,40 @@ (set_attr "type" "fcmpd")] ) +;; Fixed point to floating point conversions. +(define_code_iterator FCVT [unsigned_float float]) +(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")]) +(define_insn "*combine_vcvt_f32_" + [(set (match_operand:SF 0 "s_register_operand" "=t") + (mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0")) + (match_operand 2 + "const_double_vcvt_power_of_two_reciprocal" "Dt")))] + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math" + "vcvt.f32.\\t%0, %1, %v2" + [(set_attr "predicable" "no") + (set_attr "type" "f_cvt")] +) + +;; Not the ideal way of implementing this. Ideally we would be able to split +;; this into a move to a DP register and then a vcvt.f64.i32 +(define_insn "*combine_vcvt_f64_" + [(set (match_operand:DF 0 "s_register_operand" "=x,x,w") + (mult:DF (FCVT:DF (match_operand:SI 1 "s_register_operand" "r,t,r")) + (match_operand 2 + "const_double_vcvt_power_of_two_reciprocal" "Dt,Dt,Dt")))] + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math + && !TARGET_VFP_SINGLE" + "@ + vmov.f32\\t%0, %1\;vcvt.f64.\\t%P0, %P0, %v2 + vmov.f32\\t%0, %1\;vcvt.f64.\\t%P0, %P0, %v2 + vmov.f64\\t%0, %1, %1\; vcvt.f64.\\t%P0, %P0, %v2" + [(set_attr "predicable" "no") + (set_attr "type" "f_cvt") + (set_attr "length" "8")] +) + ;; Store multiple insn used in function prologue. - (define_insn "*push_multi_vfp" [(match_parallel 2 "multi_register_push" [(set (match_operand:BLK 0 "memory_operand" "=m") Index: gcc/config/arm/constraints.md =================================================================== --- gcc/config/arm/constraints.md (revision 182004) +++ gcc/config/arm/constraints.md (working copy) @@ -29,7 +29,7 @@ ;; in Thumb-1 state: I, J, K, L, M, N, O ;; The following multi-letter normal constraints have been used: -;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz +;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py @@ -291,6 +291,12 @@ (and (match_code "const_double") (match_test "TARGET_32BIT && TARGET_VFP_DOUBLE && vfp3_const_double_rtx (op)"))) +(define_constraint "Dt" + "@internal + In ARM/ Thumb2 a const_double which can be used with a vcvt.f32.s32 with fract bits operation" + (and (match_code "const_double") + (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_fract_bits (op)"))) + (define_memory_constraint "Ut" "@internal In ARM/Thumb-2 state an address valid for loading/storing opaque structure Index: gcc/config/arm/predicates.md =================================================================== --- gcc/config/arm/predicates.md (revision 182004) +++ gcc/config/arm/predicates.md (working copy) @@ -754,6 +754,11 @@ return true; }) +(define_predicate "const_double_vcvt_power_of_two_reciprocal" + (and (match_code "const_double") + (match_test "TARGET_32BIT && TARGET_VFP + && vfp3_const_double_for_fract_bits (op)"))) + (define_predicate "neon_struct_operand" (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2)")))