From patchwork Thu Aug 11 18:39:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 658310 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s9Gyb3kkNz9s9Y for ; Fri, 12 Aug 2016 04:39:32 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=KBQdBIhS; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=PBWO4 EZCBPDR5sxfJ+a5Yrw/QxkhLgjyB5tY3RWVPqRb+UxjH/55QVl4kI1WLSbEHm2H7 X6ZvEfvG7MYMNiCWJgi3VzSTG7FbV0N6ElNG4KlBfnhjJWiW6pOfgOMZyHmRUEYB xeO3TeGBhRlfCHf+iFJUtsh0OZzYzw1zfCO/Eg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=L/AsskV8/Au c4A/deNqU1K5lZdo=; b=KBQdBIhSdRLBGV1MjdPDbG6HWROjyBWEHZEdSGeXhjF P+HgUIfd/jaFkEzBsLg9GxtTCAnn9zEP54rl4vIK0FEAtdOXT9is6rdOQAEHKdL+ lxawUJY1c59JqjDUFeaKPnDT1VSis9Hj9qaTUJtCqk+Vlbb8t6enz82KM9+gwVGk = Received: (qmail 7166 invoked by alias); 11 Aug 2016 18:39:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7157 invoked by uid 89); 11 Aug 2016 18:39:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, RCVD_IN_SEMBACKSCATTER autolearn=no version=3.3.2 spammy=63, 1, 27, 6.3, rush X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 11 Aug 2016 18:39:12 +0000 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u7BIXnRd044863 for ; Thu, 11 Aug 2016 14:39:11 -0400 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0a-001b2d01.pphosted.com with ESMTP id 24qm9upt50-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 11 Aug 2016 14:39:10 -0400 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 11 Aug 2016 12:39:10 -0600 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e36.co.us.ibm.com (192.168.1.136) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 11 Aug 2016 12:39:07 -0600 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: wschmidt@linux.vnet.ibm.com Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id D0C3A19D803F; Thu, 11 Aug 2016 12:38:39 -0600 (MDT) Received: from b01ledav03.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u7BId7Vc67043576; Thu, 11 Aug 2016 18:39:07 GMT Received: from b01ledav03.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CD46A7E001; Thu, 11 Aug 2016 14:39:05 -0400 (EDT) Received: from BigMac.local (unknown [9.80.219.166]) by b01ledav03.gho.pok.ibm.com (Postfix) with ESMTP id 82A697E004; Thu, 11 Aug 2016 14:39:05 -0400 (EDT) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , anton@samba.org From: Bill Schmidt Subject: [PATCH, rs6000] Fix PR72863 (swap optimization misses swaps generated from intrinsics) Date: Thu, 11 Aug 2016 13:39:04 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16081118-0020-0000-0000-00000988A0C3 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005578; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000181; SDB=6.00743289; UDB=6.00349986; IPR=6.00515842; BA=6.00004658; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012314; XFM=3.00000011; UTC=2016-08-11 18:39:08 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16081118-0021-0000-0000-0000547C3A4D Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-11_13:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608110226 X-IsSubscribed: yes Hi, Anton reports in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72863 that use of vec_vsx_ld and vec_vsx_st intrinsics leaves the endian swaps in the generated code, even for very simple computations. This turns out to be because we don't generate the swaps at expand time as we do with other vector moves; rather, they don't get generated until split time. This patch fixes the problem in the obvious way. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. One new test case added. Is this ok for trunk? I would also like to backport this to 6 and 5 branches after some burn-in time. I do not plan to rush this into 6.2; we'll have to wait for 6.3 as this is only a performance issue, albeit an important one. Thanks, Bill [gcc] 2016-08-11 Bill Schmidt PR target/72863 * vsx.md (vsx_load_): For P8LE, emit swaps at expand time. (vsx_store_): Likewise. [gcc/testsuite] 2016-08-11 Bill Schmidt PR target/72863 * gcc.target/powerpc/pr72863.c: New test. Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 239310) +++ gcc/config/rs6000/vsx.md (working copy) @@ -922,13 +922,27 @@ [(set (match_operand:VSX_M 0 "vsx_register_operand" "") (match_operand:VSX_M 1 "memory_operand" ""))] "VECTOR_MEM_VSX_P (mode)" - "") +{ + /* Expand to swaps if needed, prior to swap optimization. */ + if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR) + { + rs6000_emit_le_vsx_move (operands[0], operands[1], mode); + DONE; + } +}) (define_expand "vsx_store_" [(set (match_operand:VSX_M 0 "memory_operand" "") (match_operand:VSX_M 1 "vsx_register_operand" ""))] "VECTOR_MEM_VSX_P (mode)" - "") +{ + /* Expand to swaps if needed, prior to swap optimization. */ + if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR) + { + rs6000_emit_le_vsx_move (operands[0], operands[1], mode); + DONE; + } +}) ;; Explicit load/store expanders for the builtin functions for lxvd2x, etc., ;; when you really want their element-reversing behavior. Index: gcc/testsuite/gcc.target/powerpc/pr72863.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr72863.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr72863.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O3" } */ +/* { dg-final { scan-assembler "lxvd2x" } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ + +#include + +extern unsigned char *src, *dst; + +void b(void) +{ + int i; + + unsigned char *s8 = src; + unsigned char *d8 = dst; + + for (i = 0; i < 100; i++) { + vector unsigned char vs = vec_vsx_ld(0, s8); + vector unsigned char vd = vec_vsx_ld(0, d8); + vector unsigned char vr = vec_xor(vs, vd); + vec_vsx_st(vr, 0, d8); + s8 += 16; + d8 += 16; + } +}