From patchwork Wed Oct 9 23:11:33 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 282088 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 67BDF2C012E for ; Thu, 10 Oct 2013 10:11:36 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=fOG BIGS8rht0wLha17trCx8QEkv4dIPB9UHb+csMGUPSmGrOLvdAPXnwyXo5Adwa1j8 jMQUtzhPAVkmovgmMzjyuaiiTwxLvqMOg/TIIXoi+bGR8pUp4T+RotCEeGYyPld4 uz4mr3NK6D3k/7ea3DA08YCKyQV+WYQUdmLJ9vhc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=z9g2eIWRa NqlzwexjoVHeJ0TsTc=; b=tIqM8iwt7EKIjSx3XOaUhdVDD7hLC+P1OBE/Wi8GU Tilo0V8bEio282Ov2sG74MvyYToyE1HdhtOA9uDCn/d5u5JwZXJXXqUq2lSJDBok AuqnzUcwVy5T1FnNYIDaK5NVAQzFpxgvc/oblacuNcteyd3/4kokqgEPw4ClHoQZ y4= Received: (qmail 12733 invoked by alias); 9 Oct 2013 23:11:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12716 invoked by uid 89); 9 Oct 2013 23:11:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e28smtp04.in.ibm.com Received: from e28smtp04.in.ibm.com (HELO e28smtp04.in.ibm.com) (122.248.162.4) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 09 Oct 2013 23:11:27 +0000 Received: from /spool/local by e28smtp04.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 10 Oct 2013 04:41:21 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp04.in.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 10 Oct 2013 04:41:19 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 0BC021258051 for ; Thu, 10 Oct 2013 04:41:42 +0530 (IST) Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r99NDvnj41549894 for ; Thu, 10 Oct 2013 04:43:57 +0530 Received: from d28av02.in.ibm.com (localhost [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r99NBIOD001976 for ; Thu, 10 Oct 2013 04:41:18 +0530 Received: from [9.65.233.72] (sig-9-65-233-72.mts.ibm.com [9.65.233.72]) by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r99NBFnb001845; Thu, 10 Oct 2013 04:41:16 +0530 Message-ID: <1381360293.6275.31.camel@gnopaine> Subject: [PATCH, rs6000] Fix variable permute control vectors for little endian From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Wed, 09 Oct 2013 18:11:33 -0500 Mime-Version: 1.0 X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13100923-5564-0000-0000-00000A174A45 X-IsSubscribed: yes Hi, This is a follow-up to the recent patch that fixed constant permute control vectors for little endian. When the control vector is constant, we can adjust the constant and use a vperm without increasing code size. When the control vector is unknown, however, we have to generate two additional instructions to subtract each element of the control vector from 31 (equivalently, from -1, since only 5 bits are pertinent). This patch adds the additional code generation. There are two main paths to the affected permutes: via the known pattern vec_perm, and via an altivec builtin. The builtin path causes a little difficulty because there's no way to dispatch a builtin to two different insns for BE and LE. I solved this by adding two new unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X). The insns for the builtins are changed from a define_insn to a define_insn_and_split. We create the _X forms at expand time and later split them into the correct sequences for BE and LE, using the "real" UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction. For the path via the known pattern, I added a new routine in rs6000.c in similar fashion to the solution for the constant control vector case. When the permute control vector is a rotate vector loaded by lvsl or lvsr, we can generate the desired control vector more cheaply by simply changing to use the opposite instruction. We are already doing that when expanding an unaligned load. The changes in vector.md avoid undoing that effort by circumventing the subtract-from-splat (going straight to the UNSPEC_VPERM). I bootstrapped and tested this for big endian on powerpc64-unknown-linux-gnu with no new regressions. I did the same for little endian on powerpc64le-unknown-linux-gnu. Here the results were slightly mixed: the changes fix 32 test failures, but expose an unrelated bug in 9 others when -mvsx is permitted on LE (not currently allowed). The bug is a missing permute for a vector load in the unaligned vector load logic that will be fixed in a subsequent patch. Is this okay for trunk? Thanks, Bill 2013-10-09 Bill Schmidt * config/rs6000/vector.md (vec_realign_load): Generate vperm directly to circumvent subtract from splat{31} workaround. * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New prototype. * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New. * config/rs6000/altivec.md (define_c_enum "unspec"): Add UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X. (altivec_vperm_): Convert to define_insn_and_split to separate big and little endian logic. (*altivec_vperm__internal): New define_insn. (altivec_vperm__uns): Convert to define_insn_and_split to separate big and little endian logic. (*altivec_vperm__uns_internal): New define_insn. (vec_permv16qi): Add little endian logic. Index: gcc/config/rs6000/vector.md =================================================================== --- gcc/config/rs6000/vector.md (revision 203246) +++ gcc/config/rs6000/vector.md (working copy) @@ -950,8 +950,15 @@ emit_insn (gen_altivec_vperm_ (operands[0], operands[1], operands[2], operands[3])); else - emit_insn (gen_altivec_vperm_ (operands[0], operands[2], - operands[1], operands[3])); + { + /* Avoid the "subtract from splat31" workaround for vperm since + we have changed lvsr to lvsl instead. */ + rtx unspec = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[2], + operands[1], operands[3]), + UNSPEC_VPERM); + emit_move_insn (operands[0], unspec); + } DONE; }) Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 203246) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, int); extern bool altivec_expand_vec_perm_const (rtx op[4]); +extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); extern void rs6000_expand_extract_even (rtx, rtx, rtx); extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 203247) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28608,6 +28608,54 @@ altivec_expand_vec_perm_const_le (rtx operands[4]) emit_move_insn (target, unspec); } +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the + permute control vector. But here it's not a constant, so we must + generate a vector splat/subtract to do the adjustment. */ + +void +altivec_expand_vec_perm_le (rtx operands[4]) +{ + rtx splat, unspec; + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx sel = operands[3]; + rtx tmp = target; + + /* Get everything in regs so the pattern matches. */ + if (!REG_P (op0)) + op0 = force_reg (V16QImode, op0); + if (!REG_P (op1)) + op1 = force_reg (V16QImode, op1); + if (!REG_P (sel)) + sel = force_reg (V16QImode, sel); + if (!REG_P (target)) + tmp = gen_reg_rtx (V16QImode); + + /* SEL = splat(31) - SEL. */ + /* We want to subtract from 31, but we can't vspltisb 31 since + it's out of range. -1 works as well because only the low-order + five bits of the permute control vector elements are used. */ + splat = gen_rtx_VEC_DUPLICATE (V16QImode, + gen_rtx_CONST_INT (QImode, -1)); + emit_move_insn (tmp, splat); + sel = gen_rtx_MINUS (V16QImode, tmp, sel); + emit_move_insn (tmp, sel); + + /* Permute with operands reversed and adjusted selector. */ + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp), + UNSPEC_VPERM); + + /* Copy into target, possibly by way of a register. */ + if (!REG_P (target)) + { + emit_move_insn (tmp, unspec); + unspec = tmp; + } + + emit_move_insn (target, unspec); +} + /* Expand an Altivec constant permutation. Return true if we match an efficient implementation; false to fall back to VPERM. */ Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 203244) +++ gcc/config/rs6000/altivec.md (working copy) @@ -59,6 +59,8 @@ UNSPEC_VSUMSWS UNSPEC_VPERM UNSPEC_VPERM_UNS + UNSPEC_VPERM_X + UNSPEC_VPERM_UNS_X UNSPEC_VRFIN UNSPEC_VCFUX UNSPEC_VCFSX @@ -1279,21 +1281,91 @@ "vrfiz %0,%1" [(set_attr "type" "vecfloat")]) -(define_insn "altivec_vperm_" +(define_insn_and_split "altivec_vperm_" [(set (match_operand:VM 0 "register_operand" "=v") (unspec:VM [(match_operand:VM 1 "register_operand" "v") (match_operand:VM 2 "register_operand" "v") (match_operand:V16QI 3 "register_operand" "v")] + UNSPEC_VPERM_X))] + "TARGET_ALTIVEC" + "#" + "!reload_in_progress && !reload_completed" + [(set (match_dup 0) (match_dup 4))] +{ + if (BYTES_BIG_ENDIAN) + operands[4] = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[1], + operands[2], operands[3]), + UNSPEC_VPERM); + else + { + /* We want to subtract from 31, but we can't vspltisb 31 since + it's out of range. -1 works as well because only the low-order + five bits of the permute control vector elements are used. */ + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, + gen_rtx_CONST_INT (QImode, -1)); + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, splat); + rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]); + emit_move_insn (tmp, sel); + operands[4] = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[2], + operands[1], tmp), + UNSPEC_VPERM); + } +} + [(set_attr "type" "vecperm")]) + +(define_insn "*altivec_vperm__internal" + [(set (match_operand:VM 0 "register_operand" "=v") + (unspec:VM [(match_operand:VM 1 "register_operand" "v") + (match_operand:VM 2 "register_operand" "v") + (match_operand:V16QI 3 "register_operand" "+v")] UNSPEC_VPERM))] "TARGET_ALTIVEC" "vperm %0,%1,%2,%3" [(set_attr "type" "vecperm")]) -(define_insn "altivec_vperm__uns" +(define_insn_and_split "altivec_vperm__uns" [(set (match_operand:VM 0 "register_operand" "=v") (unspec:VM [(match_operand:VM 1 "register_operand" "v") (match_operand:VM 2 "register_operand" "v") (match_operand:V16QI 3 "register_operand" "v")] + UNSPEC_VPERM_UNS_X))] + "TARGET_ALTIVEC" + "#" + "!reload_in_progress && !reload_completed" + [(set (match_dup 0) (match_dup 4))] +{ + if (BYTES_BIG_ENDIAN) + operands[4] = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[1], + operands[2], operands[3]), + UNSPEC_VPERM_UNS); + else + { + /* We want to subtract from 31, but we can't vspltisb 31 since + it's out of range. -1 works as well because only the low-order + five bits of the permute control vector elements are used. */ + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, + gen_rtx_CONST_INT (QImode, -1)); + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, splat); + rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]); + emit_move_insn (tmp, sel); + operands[4] = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[2], + operands[1], tmp), + UNSPEC_VPERM_UNS); + } +} + [(set_attr "type" "vecperm")]) + +(define_insn "*altivec_vperm__uns_internal" + [(set (match_operand:VM 0 "register_operand" "=v") + (unspec:VM [(match_operand:VM 1 "register_operand" "v") + (match_operand:VM 2 "register_operand" "v") + (match_operand:V16QI 3 "register_operand" "+v")] UNSPEC_VPERM_UNS))] "TARGET_ALTIVEC" "vperm %0,%1,%2,%3" @@ -1306,7 +1378,12 @@ (match_operand:V16QI 3 "register_operand" "")] UNSPEC_VPERM))] "TARGET_ALTIVEC" - "") +{ + if (!BYTES_BIG_ENDIAN) { + altivec_expand_vec_perm_le (operands); + DONE; + } +}) (define_expand "vec_perm_constv16qi" [(match_operand:V16QI 0 "register_operand" "")