From patchwork Tue Nov 13 13:50:12 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulrich Weigand X-Patchwork-Id: 198681 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 10EFE2C00B1 for ; Wed, 14 Nov 2012 00:51:11 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1353419473; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Message-Id:Received:Subject:To:Date: From:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=gEzI0vEDAVRMwXM8Fa1o nxhFwMo=; b=KJHqZVmJKT+5jIifowZdtKzo9SPxMmsAmvlHebTj5uBYm9Cy/vFf 8USTVb7h9uLAxByYKJR1ySBJOrAdQTxYTJHHJ64OrtX4gAd7CYM57yCwt0Sm00cp NxWXcKzAyxOkL8mynn2rVB6LS1SKKRgctzCAv/stWFKl2IIAVd8yVyY= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Message-Id:Received:Subject:To:Date:From:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=pxvXOs04y+U9ATLL6czXBjBd6E1/2D0c+LYVV/O48cDaJtJe1FtpsYfjzwVpBQ GiNfzHdA+PTBPsc8tnykq2cwcES06UYqy+SWde0VKoWy3a/vkJ/DKA7aoEVaiidk 6WSCsTKRAGzN/6/pqo1PehKa1MvX39jIw4CD4vbZarqDI=; Received: (qmail 25530 invoked by alias); 13 Nov 2012 13:50:55 -0000 Received: (qmail 25386 invoked by uid 22791); 13 Nov 2012 13:50:52 -0000 X-SWARE-Spam-Status: No, hits=-3.5 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, MSGID_FROM_MTA_HEADER, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e06smtp12.uk.ibm.com (HELO e06smtp12.uk.ibm.com) (195.75.94.108) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 13 Nov 2012 13:50:20 +0000 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 13 Nov 2012 13:50:18 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 13 Nov 2012 13:50:17 -0000 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by b06cxnps3074.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qADDo90754788290 for ; Tue, 13 Nov 2012 13:50:09 GMT Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qADDoFmW011912 for ; Tue, 13 Nov 2012 06:50:16 -0700 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id qADDoCIc011668; Tue, 13 Nov 2012 06:50:13 -0700 Message-Id: <201211131350.qADDoCIc011668@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Tue, 13 Nov 2012 14:50:12 +0100 Subject: [PATCH, ARM] Improved core -> NEON extend To: gcc-patches@gcc.gnu.org Date: Tue, 13 Nov 2012 14:50:12 +0100 (CET) From: "Ulrich Weigand" Cc: ramrad01@arm.com MIME-Version: 1.0 x-cbid: 12111313-8372-0000-0000-0000044E3F50 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello, here's another of Andrew's patches to improve NEON usage. This one was originally posted here: http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01213.html The idea to improve SImode to DImode extends that also move from core registers to NEON registers. In this situation, the compiler used to perform the extension in core registers first and then moves to NEON, wasting a core register in the process. The patch changes this to move to NEON first and extend there. [ This patch requires both the NEON shift and the lower-subreg patches, both of which are now in mainline, so this patch is ready to merge as well at this point. ] Tested on arm-linux-gnueabi. OK for mainline? Bye, Ulrich 2012-11-13 Andrew Stubbs Ulrich Weigand gcc/ * config/arm/arm.md (zero_extenddi2): Add extra alternatives for NEON registers. Add alternative for one-instruction extend-in-place. (extenddi2): Likewise. Add constraints for Thumb-mode memory loads. Prevent extend splitters doing NEON alternatives. * config/arm/iterators.md (qhs_extenddi_cstr, qhs_zextenddi_cstr): Adjust constraints to add new alternatives. * config/arm/neon.md: Add splitters for zero- and sign-extend. gcc/testsuite/ * gcc.target/arm/neon-extend-1.c: New file. * gcc.target/arm/neon-extend-2.c: New file. === modified file 'gcc/config/arm/arm.md' --- gcc/config/arm/arm.md 2012-09-19 12:57:52 +0000 +++ gcc/config/arm/arm.md 2012-09-19 13:19:31 +0000 @@ -4567,33 +4567,36 @@ ;; Zero and sign extension instructions. (define_insn "zero_extenddi2" - [(set (match_operand:DI 0 "s_register_operand" "=r") + [(set (match_operand:DI 0 "s_register_operand" "=w,r,?r") (zero_extend:DI (match_operand:QHSI 1 "" "")))] "TARGET_32BIT " "#" - [(set_attr "length" "8") + [(set_attr "length" "8,4,8") (set_attr "ce_count" "2") (set_attr "predicable" "yes")] ) (define_insn "extenddi2" - [(set (match_operand:DI 0 "s_register_operand" "=r") + [(set (match_operand:DI 0 "s_register_operand" "=w,r,?r,?r") (sign_extend:DI (match_operand:QHSI 1 "" "")))] "TARGET_32BIT " "#" - [(set_attr "length" "8") + [(set_attr "length" "8,4,8,8") (set_attr "ce_count" "2") (set_attr "shift" "1") - (set_attr "predicable" "yes")] + (set_attr "predicable" "yes") + (set_attr "arch" "*,*,a,t")] ) ;; Splits for all extensions to DImode (define_split [(set (match_operand:DI 0 "s_register_operand" "") (zero_extend:DI (match_operand 1 "nonimmediate_operand" "")))] - "TARGET_32BIT" + "TARGET_32BIT && (!TARGET_NEON + || (reload_completed + && !(IS_VFP_REGNUM (REGNO (operands[0])))))" [(set (match_dup 0) (match_dup 1))] { rtx lo_part = gen_lowpart (SImode, operands[0]); @@ -4619,7 +4622,9 @@ (define_split [(set (match_operand:DI 0 "s_register_operand" "") (sign_extend:DI (match_operand 1 "nonimmediate_operand" "")))] - "TARGET_32BIT" + "TARGET_32BIT && (!TARGET_NEON + || (reload_completed + && !(IS_VFP_REGNUM (REGNO (operands[0])))))" [(set (match_dup 0) (ashiftrt:SI (match_dup 1) (const_int 31)))] { rtx lo_part = gen_lowpart (SImode, operands[0]); === modified file 'gcc/config/arm/iterators.md' --- gcc/config/arm/iterators.md 2012-09-19 12:57:52 +0000 +++ gcc/config/arm/iterators.md 2012-09-19 13:19:31 +0000 @@ -412,8 +412,8 @@ (define_mode_attr qhs_extenddi_op [(SI "s_register_operand") (HI "nonimmediate_operand") (QI "arm_reg_or_extendqisi_mem_op")]) -(define_mode_attr qhs_extenddi_cstr [(SI "r") (HI "rm") (QI "rUq")]) -(define_mode_attr qhs_zextenddi_cstr [(SI "r") (HI "rm") (QI "rm")]) +(define_mode_attr qhs_extenddi_cstr [(SI "r,0,r,r") (HI "r,0,rm,rm") (QI "r,0,rUq,rm")]) +(define_mode_attr qhs_zextenddi_cstr [(SI "r,0,r") (HI "r,0,rm") (QI "r,0,rm")]) ;; Mode attributes used for fixed-point support. (define_mode_attr qaddsub_suf [(V4UQQ "8") (V2UHQ "16") (UQQ "8") (UHQ "16") === modified file 'gcc/config/arm/neon.md' --- gcc/config/arm/neon.md 2012-09-19 12:57:52 +0000 +++ gcc/config/arm/neon.md 2012-09-19 13:19:31 +0000 @@ -5878,3 +5878,65 @@ (const_string "neon_fp_vadd_qqq_vabs_qq")) (const_string "neon_int_5")))] ) + +;; Copy from core-to-neon regs, then extend, not vice-versa + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (sign_extend:DI (match_operand:SI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V2SI (match_dup 1))) + (set (match_dup 0) (ashiftrt:DI (match_dup 0) (const_int 32)))] + { + operands[2] = gen_rtx_REG (V2SImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (sign_extend:DI (match_operand:HI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V4HI (match_dup 1))) + (set (match_dup 0) (ashiftrt:DI (match_dup 0) (const_int 48)))] + { + operands[2] = gen_rtx_REG (V4HImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (sign_extend:DI (match_operand:QI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V8QI (match_dup 1))) + (set (match_dup 0) (ashiftrt:DI (match_dup 0) (const_int 56)))] + { + operands[2] = gen_rtx_REG (V8QImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (zero_extend:DI (match_operand:SI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V2SI (match_dup 1))) + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (const_int 32)))] + { + operands[2] = gen_rtx_REG (V2SImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (zero_extend:DI (match_operand:HI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V4HI (match_dup 1))) + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (const_int 48)))] + { + operands[2] = gen_rtx_REG (V4HImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (zero_extend:DI (match_operand:QI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V8QI (match_dup 1))) + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (const_int 56)))] + { + operands[2] = gen_rtx_REG (V8QImode, REGNO (operands[0])); + }) === added file 'gcc/testsuite/gcc.target/arm/neon-extend-1.c' --- gcc/testsuite/gcc.target/arm/neon-extend-1.c 1970-01-01 00:00:00 +0000 +++ gcc/testsuite/gcc.target/arm/neon-extend-1.c 2012-09-19 13:19:31 +0000 @@ -0,0 +1,13 @@ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +void +f (unsigned int a) +{ + unsigned long long b = a; + asm volatile ("@ extended to %0" : : "w" (b)); +} + +/* { dg-final { scan-assembler "vdup.32" } } */ +/* { dg-final { scan-assembler "vshr.u64" } } */ === added file 'gcc/testsuite/gcc.target/arm/neon-extend-2.c' --- gcc/testsuite/gcc.target/arm/neon-extend-2.c 1970-01-01 00:00:00 +0000 +++ gcc/testsuite/gcc.target/arm/neon-extend-2.c 2012-09-19 13:19:31 +0000 @@ -0,0 +1,13 @@ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +void +f (int a) +{ + long long b = a; + asm volatile ("@ extended to %0" : : "w" (b)); +} + +/* { dg-final { scan-assembler "vdup.32" } } */ +/* { dg-final { scan-assembler "vshr.s64" } } */