From patchwork Fri Nov 15 17:45:19 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 291634 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 546CF2C00A6 for ; Sat, 16 Nov 2013 04:45:49 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=S1S iFP1DmagcNQ7o9326Rx0Rdi1n8clPUZBudqs9tEg5C0s4xuLf+62hp50AncQF9lN lm6iX4g0jyIJ4MDCl47xIupB+3mrJBWEP52dq1VQiPJTYBT+wsA0dyofscDG2vDE ATWCUTgxjmQI+bzjMW7SsPrypCJvTjvg+WyApe/o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=pGyu8Ee9F odXKGgtcJPo0Pi7iow=; b=Sdy64kw6pCnN/GZbJkGnd4TuH9aiAijbIFEqkSjxl eEfSyYjLhT6uIpUQed3Uth3WGvfmubBFQukkGclI8W7dNrUh+85zsIzH6fsSKF1Y mraxgO1LHyFEL4rPx5AslRzIjc5fleGyCpAtQg+9JlTHVA++1MRmml0SSc9Oub8x rQ= Received: (qmail 24655 invoked by alias); 15 Nov 2013 17:45:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24646 invoked by uid 89); 15 Nov 2013 17:45:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_50, RDNS_NONE, URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: e28smtp03.in.ibm.com Received: from Unknown (HELO e28smtp03.in.ibm.com) (122.248.162.3) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 15 Nov 2013 17:45:37 +0000 Received: from /spool/local by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Nov 2013 23:15:24 +0530 Received: from d28dlp02.in.ibm.com (9.184.220.127) by e28smtp03.in.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 15 Nov 2013 23:15:21 +0530 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 5116F394004E for ; Fri, 15 Nov 2013 23:15:21 +0530 (IST) Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rAFHjCqg47382584 for ; Fri, 15 Nov 2013 23:15:13 +0530 Received: from d28av01.in.ibm.com (localhost [127.0.0.1]) by d28av01.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id rAFHjKb5003706 for ; Fri, 15 Nov 2013 23:15:20 +0530 Received: from [9.65.207.98] (sig-9-65-207-98.mts.ibm.com [9.65.207.98]) by d28av01.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id rAFHjHTC003483; Fri, 15 Nov 2013 23:15:18 +0530 Message-ID: <1384537519.8213.138.camel@gnopaine> Subject: [PATCH, rs6000] Correct programmer access to vperm for little endian From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Fri, 15 Nov 2013 11:45:19 -0600 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13111517-3864-0000-0000-00000B17D6D5 X-IsSubscribed: yes Hi, A previous patch of mine was misguided. It modified the altivec_vperm_* patterns to use the little endian conversion trick of reversing the input operands and complementing the permute control vector. Looking at the Altivec manual, we really can't do this. These patterns need to be direct pass-throughs to the vperm instruction, as shown in Figure 4-95 on page 130 of http://www.freescale.com/files/32bit/doc/ref_manual/ALTIVECPIM.pdf. Section 4.2 on page 49 confirms that big-endian byte ordering is to be used with the Altivec instruction descriptions. This patch reverts that specific change, cleans up some associated commentary in another part, and modifies the one test case affected by the change. gcc.dg/vmx/3b-15.c performs the argument reversal and control vector complementing in the source code, as all users will need to do when porting code containing vec_perm calls to little endian. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill gcc: 2013-11-15 Bill Schmidt * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X): Remove. (altivec_vperm_): Revert earlier little endian change. (*altivec_vperm__internal): Remove. (altivec_vperm__uns): Revert earlier little endian change. (*altivec_vperm__uns_internal): Remove. * config/rs6000/vector.md (vec_realign_load_): Revise commentary. gcc/testsuite: 2013-11-15 Bill Schmidt * gcc.dg/vmx/3b-15.c: Revise for little endian. Index: gcc/testsuite/gcc.dg/vmx/3b-15.c =================================================================== --- gcc/testsuite/gcc.dg/vmx/3b-15.c (revision 204792) +++ gcc/testsuite/gcc.dg/vmx/3b-15.c (working copy) @@ -3,7 +3,11 @@ vector unsigned char f (vector unsigned char a, vector unsigned char b, vector unsigned char c) { +#ifdef __BIG_ENDIAN__ return vec_perm(a,b,c); +#else + return vec_perm(b,a,c); +#endif } static void test() @@ -12,8 +16,13 @@ static void test() 8,9,10,11,12,13,14,15}), ((vector unsigned char){70,71,72,73,74,75,76,77, 78,79,80,81,82,83,84,85}), +#ifdef __BIG_ENDIAN__ ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a, 0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})), +#else + ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5, + 0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})), +#endif ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})), "f"); } Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 204792) +++ gcc/config/rs6000/vector.md (working copy) @@ -966,8 +966,8 @@ operands[2], operands[3])); else { - /* Avoid the "subtract from splat31" workaround for vperm since - we have changed lvsr to lvsl instead. */ + /* We have changed lvsr to lvsl, so to complete the transformation + of vperm for LE, we must swap the inputs. */ rtx unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, operands[2], operands[1], operands[3]), Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 204792) +++ gcc/config/rs6000/altivec.md (working copy) @@ -59,8 +59,6 @@ UNSPEC_VSUMSWS UNSPEC_VPERM UNSPEC_VPERM_UNS - UNSPEC_VPERM_X - UNSPEC_VPERM_UNS_X UNSPEC_VRFIN UNSPEC_VCFUX UNSPEC_VCFSX @@ -1393,91 +1391,21 @@ "vrfiz %0,%1" [(set_attr "type" "vecfloat")]) -(define_insn_and_split "altivec_vperm_" +(define_insn "altivec_vperm_" [(set (match_operand:VM 0 "register_operand" "=v") (unspec:VM [(match_operand:VM 1 "register_operand" "v") (match_operand:VM 2 "register_operand" "v") (match_operand:V16QI 3 "register_operand" "v")] - UNSPEC_VPERM_X))] - "TARGET_ALTIVEC" - "#" - "!reload_in_progress && !reload_completed" - [(set (match_dup 0) (match_dup 4))] -{ - if (BYTES_BIG_ENDIAN) - operands[4] = gen_rtx_UNSPEC (mode, - gen_rtvec (3, operands[1], - operands[2], operands[3]), - UNSPEC_VPERM); - else - { - /* We want to subtract from 31, but we can't vspltisb 31 since - it's out of range. -1 works as well because only the low-order - five bits of the permute control vector elements are used. */ - rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, - gen_rtx_CONST_INT (QImode, -1)); - rtx tmp = gen_reg_rtx (V16QImode); - emit_move_insn (tmp, splat); - rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]); - emit_move_insn (tmp, sel); - operands[4] = gen_rtx_UNSPEC (mode, - gen_rtvec (3, operands[2], - operands[1], tmp), - UNSPEC_VPERM); - } -} - [(set_attr "type" "vecperm")]) - -(define_insn "*altivec_vperm__internal" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "+v")] UNSPEC_VPERM))] "TARGET_ALTIVEC" "vperm %0,%1,%2,%3" [(set_attr "type" "vecperm")]) -(define_insn_and_split "altivec_vperm__uns" +(define_insn "altivec_vperm__uns" [(set (match_operand:VM 0 "register_operand" "=v") (unspec:VM [(match_operand:VM 1 "register_operand" "v") (match_operand:VM 2 "register_operand" "v") (match_operand:V16QI 3 "register_operand" "v")] - UNSPEC_VPERM_UNS_X))] - "TARGET_ALTIVEC" - "#" - "!reload_in_progress && !reload_completed" - [(set (match_dup 0) (match_dup 4))] -{ - if (BYTES_BIG_ENDIAN) - operands[4] = gen_rtx_UNSPEC (mode, - gen_rtvec (3, operands[1], - operands[2], operands[3]), - UNSPEC_VPERM_UNS); - else - { - /* We want to subtract from 31, but we can't vspltisb 31 since - it's out of range. -1 works as well because only the low-order - five bits of the permute control vector elements are used. */ - rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, - gen_rtx_CONST_INT (QImode, -1)); - rtx tmp = gen_reg_rtx (V16QImode); - emit_move_insn (tmp, splat); - rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]); - emit_move_insn (tmp, sel); - operands[4] = gen_rtx_UNSPEC (mode, - gen_rtvec (3, operands[2], - operands[1], tmp), - UNSPEC_VPERM_UNS); - } -} - [(set_attr "type" "vecperm")]) - -(define_insn "*altivec_vperm__uns_internal" - [(set (match_operand:VM 0 "register_operand" "=v") - (unspec:VM [(match_operand:VM 1 "register_operand" "v") - (match_operand:VM 2 "register_operand" "v") - (match_operand:V16QI 3 "register_operand" "+v")] UNSPEC_VPERM_UNS))] "TARGET_ALTIVEC" "vperm %0,%1,%2,%3"