From patchwork Fri Mar 21 01:38:14 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 332472 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id F0FE82C00AE for ; Fri, 21 Mar 2014 12:38:39 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=Rkw 5nPIio7r7MsTcvxoPGUf8E0aqLJOowsuwD+9uS0Kum6Yv7ZI9+EsdkgbxqTFJnKO K8OpFYrAT7Pc67aJOd6qQpD2do07MvZ4PdOWhLw8w2rUIl/6x4eMIDrTZjK7zWp7 aFfpylaCTqISU9A32nQnceHqpeJ/6y1HDE7O8pug= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=1AJ9FfxHu EU1nlr1epH9PkSMYBQ=; b=cUHWLFQ2GFYHqc2XhTrYSKoB4CTO+BwSDBLfXVv6o QYJ2qqSzjwMDbY6IAwWnYjYHPzZAHXW8duq+4PC5kC22lcE/gSfYlJUmzbpuITdk N63PPwTwau2DFh235Oi8KlKLt7+OVSHnd6F3kA1LmqBqdZ9DQWP5wQTB+CUjjD8a f8= Received: (qmail 20632 invoked by alias); 21 Mar 2014 01:38:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 20609 invoked by uid 89); 21 Mar 2014 01:38:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_20, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e28smtp04.in.ibm.com Received: from e28smtp04.in.ibm.com (HELO e28smtp04.in.ibm.com) (122.248.162.4) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 21 Mar 2014 01:38:24 +0000 Received: from /spool/local by e28smtp04.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 21 Mar 2014 07:08:19 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp04.in.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 21 Mar 2014 07:08:18 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 32BD4125803E for ; Fri, 21 Mar 2014 07:10:38 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s2L1c9fr65601644 for ; Fri, 21 Mar 2014 07:08:09 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s2L1cGeE008189 for ; Fri, 21 Mar 2014 07:08:16 +0530 Received: from [9.65.83.83] (sig-9-65-83-83.mts.ibm.com [9.65.83.83]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s2L1cEwD008048; Fri, 21 Mar 2014 07:08:15 +0530 Message-ID: <1395365894.3599.35.camel@gnopaine> Subject: [PATCH, rs6000] More efficient vector permute for little endian From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Thu, 20 Mar 2014 20:38:14 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14032101-5564-0000-0000-00000C9B47DF X-IsSubscribed: yes Hi, The original workaround for vector permute on a little endian platform includes subtracting each element of the permute control vector from 31. Because the upper 3 bits of each element are unimportant, this was implemented as subtracting the whole vector from a splat of -1. On reflection this can be done more efficiently with a vector nor operation. This patch makes that change. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2014-03-20 Bill Schmidt * config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate a pattern for vector nor instead of subtract from splat(-1). (altivec_expand_vec_perm_const_le): Likewise. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 208704) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5622,12 +5622,10 @@ rs6000_expand_vector_set (rtx target, rtx val, int else { /* Invert selector. */ - rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, - gen_rtx_CONST_INT (QImode, -1)); + rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x)); + rtx andx = gen_rtx_AND (V16QImode, notx, notx); rtx tmp = gen_reg_rtx (V16QImode); - emit_move_insn (tmp, splat); - x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x)); - emit_move_insn (tmp, x); + emit_move_insn (tmp, andx); /* Permute with operands reversed and adjusted selector. */ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), @@ -30321,18 +30319,18 @@ altivec_expand_vec_perm_const_le (rtx operands[4]) /* Similarly to altivec_expand_vec_perm_const_le, we must adjust the permute control vector. But here it's not a constant, so we must - generate a vector splat/subtract to do the adjustment. */ + generate a vector NOR to do the adjustment. */ void altivec_expand_vec_perm_le (rtx operands[4]) { - rtx splat, unspec; + rtx notx, andx, unspec; rtx target = operands[0]; rtx op0 = operands[1]; rtx op1 = operands[2]; rtx sel = operands[3]; rtx tmp = target; - rtx splatreg = gen_reg_rtx (V16QImode); + rtx norreg = gen_reg_rtx (V16QImode); enum machine_mode mode = GET_MODE (target); /* Get everything in regs so the pattern matches. */ @@ -30345,18 +30343,14 @@ altivec_expand_vec_perm_le (rtx operands[4]) if (!REG_P (target)) tmp = gen_reg_rtx (mode); - /* SEL = splat(31) - SEL. */ - /* We want to subtract from 31, but we can't vspltisb 31 since - it's out of range. -1 works as well because only the low-order - five bits of the permute control vector elements are used. */ - splat = gen_rtx_VEC_DUPLICATE (V16QImode, - gen_rtx_CONST_INT (QImode, -1)); - emit_move_insn (splatreg, splat); - sel = gen_rtx_MINUS (V16QImode, splatreg, sel); - emit_move_insn (splatreg, sel); + /* Invert the selector with a VNOR. */ + notx = gen_rtx_NOT (V16QImode, sel); + andx = gen_rtx_AND (V16QImode, notx, notx); + emit_move_insn (norreg, andx); /* Permute with operands reversed and adjusted selector. */ - unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, splatreg), UNSPEC_VPERM); + unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg), + UNSPEC_VPERM); /* Copy into target, possibly by way of a register. */ if (!REG_P (target))