From patchwork Sat Sep 6 17:50:22 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 386644 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 674FB1400E4 for ; Sun, 7 Sep 2014 03:50:33 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; q=dns; s=default; b=dz1buBL1CMLNcUhj iSEuxGQ4C6iE3xygkTLwDtnnFjbTCf+gPN4cd7VblIN2h5kL0yTf9fv6x7MaFfsn rW5hZXqQgzeKZ8kTj4vlOfyN2I/kSrSqDCyatjZJpW41tTtU3CpM0ua9Pjq0UgRd /LWvfH2cbtl1yhGAdVZrYpqCuMs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; s=default; bh=83eQQhJkFyaMjfk6OZLngw oXhos=; b=VGU537p6Vt/SaHFS0HOBDrHpn3t3fVoR5PdUk1fCIyN7gyOUjteXJc APRFWwf8Jokkaaa1z0+Sdx0xiqG/aslIBDFtoBEM1jNpdWjGgudIOzTWFP0niWvw +UjjJ0y4j/R07ly5ioPJ1AcApVWu0KNnJHI4OQSYQC4CmcIqzrXT4= Received: (qmail 10499 invoked by alias); 6 Sep 2014 17:50:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 10487 invoked by uid 89); 6 Sep 2014 17:50:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e34.co.us.ibm.com Received: from e34.co.us.ibm.com (HELO e34.co.us.ibm.com) (32.97.110.152) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Sat, 06 Sep 2014 17:50:23 +0000 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 6 Sep 2014 11:50:21 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sat, 6 Sep 2014 11:50:19 -0600 Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id E63BF3E4003E for ; Sat, 6 Sep 2014 11:50:18 -0600 (MDT) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s86HoIli27459652 for ; Sat, 6 Sep 2014 19:50:18 +0200 Received: from d03av03.boulder.ibm.com (localhost [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s86HoI0K007297 for ; Sat, 6 Sep 2014 11:50:18 -0600 Received: from [9.50.20.53] ([9.50.20.53]) by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s86HoHdB007285; Sat, 6 Sep 2014 11:50:18 -0600 Message-ID: <1410025822.3163.105.camel@gnopaine> Subject: [PATCH, rs6000] Add handling for UNSPEC_VSPLT_DIRECT to analyze_swaps From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Sat, 06 Sep 2014 12:50:22 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14090617-1542-0000-0000-00000495CB51 X-IsSubscribed: yes Hi, Here's one more case of special handling that allows us to optimize more vectorized loops in analyze_swaps. UNSPEC_VSPLT_DIRECT is used in some cases to avoid the possibility of an endian fixup. We can still handle this by swapping the lane chosen as the source of the splat. While implementing this I realized that I had had a thinko with the adjust_extract changes in the last related patch. When swapping doublewords, the right change is to add or subtract n_elts/2, not to subtract from n_elts-1. I've corrected that issue herein as well. I've added a new test to demonstrate the UNSPEC_VSPLT_DIRECT case. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill [gcc] 2014-09-06 Bill Schmidt * config/rs6000/rs6000.c (special_handling_values): Add SH_SPLAT. (rtx_is_swappable_p): Convert UNSPEC cascading ||s to a switch statement; allow optimization of UNSPEC_VSPLT_DIRECT with special handling SH_SPLAT. (adjust_extract): Fix test for VEC_DUPLICATE case; fix adjustment of extracted lane. (adjust_splat): New function. (handle_special_swappables): Call adjust_splat for SH_SPLAT. (dump_swap_insn_table): Add case for SH_SPLAT. [gcc/testsuite] 2014-09-06 Bill Schmidt * gcc.target/powerpc/swaps-p8-16.c: New test. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 214957) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -33524,7 +33524,8 @@ enum special_handling_values { SH_SUBREG, SH_NOSWAP_LD, SH_NOSWAP_ST, - SH_EXTRACT + SH_EXTRACT, + SH_SPLAT }; /* Union INSN with all insns containing definitions that reach USE. @@ -33735,43 +33736,50 @@ rtx_is_swappable_p (rtx op, unsigned int *special) vector splat are element-order sensitive. A few of these cases might be workable with special handling if required. */ int val = XINT (op, 1); - if (val == UNSPEC_VMRGH_DIRECT - || val == UNSPEC_VMRGL_DIRECT - || val == UNSPEC_VPACK_SIGN_SIGN_SAT - || val == UNSPEC_VPACK_SIGN_UNS_SAT - || val == UNSPEC_VPACK_UNS_UNS_MOD - || val == UNSPEC_VPACK_UNS_UNS_MOD_DIRECT - || val == UNSPEC_VPACK_UNS_UNS_SAT - || val == UNSPEC_VPERM - || val == UNSPEC_VPERM_UNS - || val == UNSPEC_VPERMHI - || val == UNSPEC_VPERMSI - || val == UNSPEC_VPKPX - || val == UNSPEC_VSLDOI - || val == UNSPEC_VSLO - || val == UNSPEC_VSPLT_DIRECT - || val == UNSPEC_VSRO - || val == UNSPEC_VSUM2SWS - || val == UNSPEC_VSUM4S - || val == UNSPEC_VSUM4UBS - || val == UNSPEC_VSUMSWS - || val == UNSPEC_VSUMSWS_DIRECT - || val == UNSPEC_VSX_CONCAT - || val == UNSPEC_VSX_CVSPDP - || val == UNSPEC_VSX_CVSPDPN - || val == UNSPEC_VSX_SET - || val == UNSPEC_VSX_SLDWI - || val == UNSPEC_VUNPACK_HI_SIGN - || val == UNSPEC_VUNPACK_HI_SIGN_DIRECT - || val == UNSPEC_VUNPACK_LO_SIGN - || val == UNSPEC_VUNPACK_LO_SIGN_DIRECT - || val == UNSPEC_VUPKHPX - || val == UNSPEC_VUPKHS_V4SF - || val == UNSPEC_VUPKHU_V4SF - || val == UNSPEC_VUPKLPX - || val == UNSPEC_VUPKLS_V4SF - || val == UNSPEC_VUPKHU_V4SF) - return 0; + switch (val) + { + default: + break; + case UNSPEC_VMRGH_DIRECT: + case UNSPEC_VMRGL_DIRECT: + case UNSPEC_VPACK_SIGN_SIGN_SAT: + case UNSPEC_VPACK_SIGN_UNS_SAT: + case UNSPEC_VPACK_UNS_UNS_MOD: + case UNSPEC_VPACK_UNS_UNS_MOD_DIRECT: + case UNSPEC_VPACK_UNS_UNS_SAT: + case UNSPEC_VPERM: + case UNSPEC_VPERM_UNS: + case UNSPEC_VPERMHI: + case UNSPEC_VPERMSI: + case UNSPEC_VPKPX: + case UNSPEC_VSLDOI: + case UNSPEC_VSLO: + case UNSPEC_VSRO: + case UNSPEC_VSUM2SWS: + case UNSPEC_VSUM4S: + case UNSPEC_VSUM4UBS: + case UNSPEC_VSUMSWS: + case UNSPEC_VSUMSWS_DIRECT: + case UNSPEC_VSX_CONCAT: + case UNSPEC_VSX_CVSPDP: + case UNSPEC_VSX_CVSPDPN: + case UNSPEC_VSX_SET: + case UNSPEC_VSX_SLDWI: + case UNSPEC_VUNPACK_HI_SIGN: + case UNSPEC_VUNPACK_HI_SIGN_DIRECT: + case UNSPEC_VUNPACK_LO_SIGN: + case UNSPEC_VUNPACK_LO_SIGN_DIRECT: + case UNSPEC_VUPKHPX: + case UNSPEC_VUPKHS_V4SF: + case UNSPEC_VUPKHU_V4SF: + case UNSPEC_VUPKLPX: + case UNSPEC_VUPKLS_V4SF: + case UNSPEC_VUPKLU_V4SF: + return 0; + case UNSPEC_VSPLT_DIRECT: + *special = SH_SPLAT; + return 1; + } } default: @@ -34098,20 +34106,20 @@ permute_store (rtx_insn *insn) INSN_UID (insn)); } -/* Given OP that contains a vector extract operation, change the index - of the extracted lane to count from the other side of the vector. */ +/* Given OP that contains a vector extract operation, adjust the index + of the extracted lane to account for the doubleword swap. */ static void adjust_extract (rtx_insn *insn) { - rtx body = PATTERN (insn); + rtx src = SET_SRC (PATTERN (insn)); /* The vec_select may be wrapped in a vec_duplicate for a splat, so account for that. */ - rtx sel = (GET_CODE (body) == VEC_DUPLICATE - ? XEXP (XEXP (body, 0), 1) - : XEXP (body, 1)); + rtx sel = GET_CODE (src) == VEC_DUPLICATE ? XEXP (src, 0) : src; rtx par = XEXP (sel, 1); - int nunits = GET_MODE_NUNITS (GET_MODE (XEXP (sel, 0))); - XVECEXP (par, 0, 0) = GEN_INT (nunits - 1 - INTVAL (XVECEXP (par, 0, 0))); + int half_elts = GET_MODE_NUNITS (GET_MODE (XEXP (sel, 0))) >> 1; + int lane = INTVAL (XVECEXP (par, 0, 0)); + lane = lane >= half_elts ? lane - half_elts : lane + half_elts; + XVECEXP (par, 0, 0) = GEN_INT (lane); INSN_CODE (insn) = -1; /* Force re-recognition. */ df_insn_rescan (insn); @@ -34119,6 +34127,24 @@ adjust_extract (rtx_insn *insn) fprintf (dump_file, "Changing lane for extract %d\n", INSN_UID (insn)); } +/* Given OP that contains a vector direct-splat operation, adjust the index + of the source lane to account for the doubleword swap. */ +static void +adjust_splat (rtx_insn *insn) +{ + rtx body = PATTERN (insn); + rtx unspec = XEXP (body, 1); + int half_elts = GET_MODE_NUNITS (GET_MODE (unspec)) >> 1; + int lane = INTVAL (XVECEXP (unspec, 0, 1)); + lane = lane >= half_elts ? lane - half_elts : lane + half_elts; + XVECEXP (unspec, 0, 1) = GEN_INT (lane); + INSN_CODE (insn) = -1; /* Force re-recognition. */ + df_insn_rescan (insn); + + if (dump_file) + fprintf (dump_file, "Changing lane for splat %d\n", INSN_UID (insn)); +} + /* The insn described by INSN_ENTRY[I] can be swapped, but only with special handling. Take care of that here. */ static void @@ -34160,6 +34186,11 @@ handle_special_swappables (swap_web_entry *insn_en case SH_EXTRACT: /* Change the lane on an extract operation. */ adjust_extract (insn); + break; + case SH_SPLAT: + /* Change the lane on a direct-splat operation. */ + adjust_splat (insn); + break; } } @@ -34230,6 +34261,8 @@ dump_swap_insn_table (swap_web_entry *insn_entry) fputs ("special:store ", dump_file); else if (insn_entry[i].special_handling == SH_EXTRACT) fputs ("special:extract ", dump_file); + else if (insn_entry[i].special_handling == SH_SPLAT) + fputs ("special:splat ", dump_file); } if (insn_entry[i].web_not_optimizable) fputs ("unoptimizable ", dump_file); Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-16.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-16.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-16.c (working copy) @@ -0,0 +1,56 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options "-mcpu=power8 -O3" } */ +/* { dg-final { scan-assembler "lxvd2x" } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler "vspltw" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ + +#include +void abort(); + +typedef struct xx {vector double l; vector double h;} xx; + +#define N 4096 +#define M 10000000 +vector float ca[N][4] = {0}; +vector float cb[N][4] = {0}; +vector float cc[N][4] = {0}; + +__attribute__((noinline)) void foo () +{ + int i; + vector float brow; + + for (i = 0; i < N; i++) { + + brow = cb[i][0]; + cc[i][0] = vec_mul(vec_splats(brow[0]), ca[i][0]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(brow[1]), ca[i][1]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(brow[2]), ca[i][2]); + cc[i][0] = vec_madd(cc[i][0],vec_splats(brow[3]), ca[i][3]); + + brow = cb[i][1]; + cc[i][1] = vec_mul(vec_splats(brow[0]), ca[i][0]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(brow[1]), ca[i][1]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(brow[2]), ca[i][2]); + cc[i][1] = vec_madd(cc[i][0],vec_splats(brow[3]), ca[i][3]); + + brow = cb[i][2]; + cc[i][2] = vec_mul(vec_splats(brow[0]), ca[i][0]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(brow[1]), ca[i][1]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(brow[2]), ca[i][2]); + cc[i][2] = vec_madd(cc[i][0],vec_splats(brow[3]), ca[i][3]); + + brow = cb[i][3]; + cc[i][3] = vec_mul(vec_splats(brow[0]), ca[i][0]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(brow[1]), ca[i][1]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(brow[2]), ca[i][2]); + cc[i][3] = vec_madd(cc[i][0],vec_splats(brow[3]), ca[i][3]); + } +} + +int main () +{ + foo (); + return 0; +}