From patchwork Thu Sep 4 04:28:16 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 385721 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 12EB51401F6 for ; Thu, 4 Sep 2014 14:28:39 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; q=dns; s=default; b=rfoigjeP0pykRAho xqCD5brNZPUI95UC/KpEpAJBYgJ31P7IbYBY2EaqPHy64h234eOupiy+jOqNgOlN lzTHuiluAG/feEixSdRqaD70hP7Nv0R/TKwjv1PfWii+D7WqCKrgpjtiaexw+OJT gxdpus2BZ9zRX40Ga2MZaA6J254= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; s=default; bh=vEtn1Y9T5b3PdXMntERQyT Hl6RE=; b=mxNdVPNEN/a9j3TF6OC5EluxgFi98pIMKg1Kbo0NZKY6Ip3gBZVsg0 7xrIJfqM4EFCQZmuzGpyfcZamjsVu03alR/EBaaY7JGuRsIatEOTBJyDnbSRPCF5 woEwSkjX6hD+wEf1P/t89N5HswR00AVlQ620xzW0FqNxeZA+GlRhk= Received: (qmail 5691 invoked by alias); 4 Sep 2014 04:28:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5675 invoked by uid 89); 4 Sep 2014 04:28:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e9.ny.us.ibm.com Received: from e9.ny.us.ibm.com (HELO e9.ny.us.ibm.com) (32.97.182.139) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Thu, 04 Sep 2014 04:28:20 +0000 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 4 Sep 2014 00:28:18 -0400 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 4 Sep 2014 00:28:15 -0400 Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 19E6A6E8029 for ; Thu, 4 Sep 2014 00:28:03 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s844SEUx7930190 for ; Thu, 4 Sep 2014 04:28:14 GMT Received: from d01av04.pok.ibm.com (localhost [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s844SE8e005549 for ; Thu, 4 Sep 2014 00:28:14 -0400 Received: from [9.50.20.53] ([9.50.20.53]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s844SDU2005515; Thu, 4 Sep 2014 00:28:14 -0400 Message-ID: <1409804896.3163.55.camel@gnopaine> Subject: [PATCH, rs6000] Handle vec_extract and splat patterns in analyze_swaps From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Wed, 03 Sep 2014 23:28:16 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14090404-7182-0000-0000-0000005BED52 X-IsSubscribed: yes Hi, This patch adds more special handling to analyze_swaps to allow us to improve more computations. Previously I had disallowed VEC_SELECT in all cases. This is now changed to allow a select of a single lane, either for an extract operation or for a splat operation. If a computation containing such operations is optimized, the selected lane is changed to count from the other end of the vector. Several new tests are added to check these opportunities are now exploited. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-09-03 Bill Schmidt * config/rs6000/rs6000.c (special_handling_values): Add SH_EXTRACT. (rtx_is_swappable_p): Look for patterns with a VEC_SELECT, perhaps wrapped in a VEC_DUPLICATE, representing an extract. Mark these as swappable with special handling SH_EXTRACT. Remove UNSPEC_VSX_XXSPLTW from the list of disallowed unspecs for the optimization. (adjust_extract): New function. (handle_special_swappables): Add default to case statement; add case for SH_EXTRACT that calls adjust_extract. (dump_swap_insn_table): Handle SH_EXTRACT. [gcc/testsuite] 2014-09-03 Bill Schmidt * gcc.target/powerpc/swaps-p8-13.c: New test. * gcc.target/powerpc/swaps-p8-14.c: New test. * gcc.target/powerpc/swaps-p8-15.c: New test. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 214879) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -33562,7 +33562,8 @@ enum special_handling_values { SH_CONST_VECTOR, SH_SUBREG, SH_NOSWAP_LD, - SH_NOSWAP_ST + SH_NOSWAP_ST, + SH_EXTRACT }; /* Union INSN with all insns containing definitions that reach USE. @@ -33704,6 +33705,7 @@ rtx_is_swappable_p (rtx op, unsigned int *special) { enum rtx_code code = GET_CODE (op); int i, j; + rtx parallel; switch (code) { @@ -33714,7 +33716,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) return 1; case VEC_CONCAT: - case VEC_SELECT: case ASM_INPUT: case ASM_OPERANDS: return 0; @@ -33732,9 +33733,31 @@ rtx_is_swappable_p (rtx op, unsigned int *special) handling. */ if (GET_CODE (XEXP (op, 0)) == CONST_INT) return 1; + else if (GET_CODE (XEXP (op, 0)) == REG + && GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0))) + /* This catches V2DF and V2DI splat, at a minimum. */ + return 1; + else if (GET_CODE (XEXP (op, 0)) == VEC_SELECT) + /* If the duplicated item is from a select, defer to the select + processing to see if we can change the lane for the splat. */ + return rtx_is_swappable_p (XEXP (op, 0), special); else return 0; + case VEC_SELECT: + /* A vec_extract operation is ok if we change the lane. */ + if (GET_CODE (XEXP (op, 0)) == REG + && GET_MODE_INNER (GET_MODE (XEXP (op, 0))) == GET_MODE (op) + && GET_CODE ((parallel = XEXP (op, 1))) == PARALLEL + && XVECLEN (parallel, 0) == 1 + && GET_CODE (XVECEXP (parallel, 0, 0)) == CONST_INT) + { + *special = SH_EXTRACT; + return 1; + } + else + return 0; + case UNSPEC: { /* Various operations are unsafe for this optimization, at least @@ -33777,7 +33800,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) || val == UNSPEC_VSX_CVSPDPN || val == UNSPEC_VSX_SET || val == UNSPEC_VSX_SLDWI - || val == UNSPEC_VSX_XXSPLTW || val == UNSPEC_VUNPACK_HI_SIGN || val == UNSPEC_VUNPACK_HI_SIGN_DIRECT || val == UNSPEC_VUNPACK_LO_SIGN @@ -34115,6 +34137,27 @@ permute_store (rtx_insn *insn) INSN_UID (insn)); } +/* Given OP that contains a vector extract operation, change the index + of the extracted lane to count from the other side of the vector. */ +static void +adjust_extract (rtx_insn *insn) +{ + rtx body = PATTERN (insn); + /* The vec_select may be wrapped in a vec_duplicate for a splat, so + account for that. */ + rtx sel = (GET_CODE (body) == VEC_DUPLICATE + ? XEXP (XEXP (body, 0), 1) + : XEXP (body, 1)); + rtx par = XEXP (sel, 1); + int nunits = GET_MODE_NUNITS (GET_MODE (XEXP (sel, 0))); + XVECEXP (par, 0, 0) = GEN_INT (nunits - 1 - INTVAL (XVECEXP (par, 0, 0))); + INSN_CODE (insn) = -1; /* Force re-recognition. */ + df_insn_rescan (insn); + + if (dump_file) + fprintf (dump_file, "Changing lane for extract %d\n", INSN_UID (insn)); +} + /* The insn described by INSN_ENTRY[I] can be swapped, but only with special handling. Take care of that here. */ static void @@ -34125,6 +34168,8 @@ handle_special_swappables (swap_web_entry *insn_en switch (insn_entry[i].special_handling) { + default: + gcc_unreachable (); case SH_CONST_VECTOR: { /* A CONST_VECTOR will only show up somewhere in the RHS of a SET. */ @@ -34151,6 +34196,9 @@ handle_special_swappables (swap_web_entry *insn_en /* Convert a non-permuting store to a permuting one. */ permute_store (insn); break; + case SH_EXTRACT: + /* Change the lane on an extract operation. */ + adjust_extract (insn); } } @@ -34219,6 +34267,8 @@ dump_swap_insn_table (swap_web_entry *insn_entry) fputs ("special:load ", dump_file); else if (insn_entry[i].special_handling == SH_NOSWAP_ST) fputs ("special:store ", dump_file); + else if (insn_entry[i].special_handling == SH_EXTRACT) + fputs ("special:extract ", dump_file); } if (insn_entry[i].web_not_optimizable) fputs ("unoptimizable ", dump_file); Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c (working copy) @@ -0,0 +1,53 @@ +/* { dg-do run { target { powerpc64le-*-* } } } */ +/* { dg-options "-mcpu=power8 -O3" } */ + +#include +void abort (); + +#define N 4096 +long long ca[N] __attribute__((aligned(16))); +long long cb[N] __attribute__((aligned(16))); +long long cc[N] __attribute__((aligned(16))); +long long cd[N] __attribute__((aligned(16))); +long long x; + +__attribute__((noinline)) void foo () +{ + int i; + vector long long va, vb, vc, vd, tmp; + volatile unsigned long long three = 3; + vector unsigned long long threes = vec_splats (three); + for (i = 0; i < N; i+=2) { + vb = vec_vsx_ld (0, (vector long long *)&cb[i]); + vc = vec_vsx_ld (0, (vector long long *)&cc[i]); + vd = vec_vsx_ld (0, (vector long long *)&cd[i]); + tmp = vec_add (vb, vc); + tmp = vec_sub (tmp, vd); + tmp = vec_sra (tmp, threes); + x = vec_extract (tmp, 0); + vec_vsx_st (tmp, 0, (vector long long *)&ca[i]); + } +} + +__attribute__((noinline)) void init () +{ + int i; + for (i = 0; i < N; ++i) { + cb[i] = 3 * i - 2048; + cc[i] = -5 * i + 93; + cd[i] = i + 14; + } +} + +int main () +{ + int i; + init (); + foo (); + for (i = 0; i < N; ++i) + if (ca[i] != (-3 * i - 1969) >> 3) + abort (); + if (x != ca[N-1]) + abort (); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c (working copy) @@ -0,0 +1,42 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options "-mcpu=power8 -O3" } */ +/* { dg-final { scan-assembler "lxvd2x" } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler "stxsdx" } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 1 } } */ + +/* The only xxpermdi expected is for the vec_splats. */ + +#include +void abort (); + +#define N 4096 +long long ca[N] __attribute__((aligned(16))); +long long cb[N] __attribute__((aligned(16))); +long long cc[N] __attribute__((aligned(16))); +long long cd[N] __attribute__((aligned(16))); +long long x; + +__attribute__((noinline)) void foo () +{ + int i; + vector long long va, vb, vc, vd, tmp; + volatile unsigned long long three = 3; + vector unsigned long long threes = vec_splats (three); + for (i = 0; i < N; i+=2) { + vb = vec_vsx_ld (0, (vector long long *)&cb[i]); + vc = vec_vsx_ld (0, (vector long long *)&cc[i]); + vd = vec_vsx_ld (0, (vector long long *)&cd[i]); + tmp = vec_add (vb, vc); + tmp = vec_sub (tmp, vd); + tmp = vec_sra (tmp, threes); + x = vec_extract (tmp, 0); + vec_vsx_st (tmp, 0, (vector long long *)&ca[i]); + } +} + +int main () +{ + foo (); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c (working copy) @@ -0,0 +1,49 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options "-mcpu=power8 -O3" } */ +/* { dg-final { scan-assembler "lxvd2x" } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler "xxspltw" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ + +#include +void abort(); + +typedef struct xx {vector double l; vector double h;} xx; + +#define N 4096 +#define M 10000000 +vector float ca[N][4] = {0}; +vector float cb[N][4] = {0}; +vector float cc[N][4] = {0}; + +__attribute__((noinline)) void foo () +{ + int i; + for (i = 0; i < N; i++) { + cc[i][0] = vec_mul(vec_splats(cb[i][0][0]), ca[i][0]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][1]), ca[i][1]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][2]), ca[i][2]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][3]), ca[i][3]); + + cc[i][1] = vec_mul(vec_splats(cb[i][1][0]), ca[i][0]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][1]), ca[i][1]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][2]), ca[i][2]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][3]), ca[i][3]); + + cc[i][2] = vec_mul(vec_splats(cb[i][2][0]), ca[i][0]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][1]), ca[i][1]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][2]), ca[i][2]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][3]), ca[i][3]); + + cc[i][3] = vec_mul(vec_splats(cb[i][3][0]), ca[i][0]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][1]), ca[i][1]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][2]), ca[i][2]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][3]), ca[i][3]); + } +} + +int main () +{ + foo (); + return 0; +}