From patchwork Tue Dec 19 13:59:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 850834 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-469553-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ngUlZ6ae"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3z1KL21G6nz9s9Y for ; Wed, 20 Dec 2017 01:00:24 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=mO0Ln egPYwf8nN50ONpkDptcepQsbGU/DOk+6iimA2Zq5+too6J2lNKqSVvKEgFzp7GD9 vEuaiKyohJU1o2ctH7mN955MZcMO1w/bpaqa+YLUqtaOs1Mxmst0s2Xlm5+cVWuX EmKyivjSaaqRHnSozJaUcbaVxcAXJMWURTx7pQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=qzG8dXv1Jxt qCeri+SZMoLmluKQ=; b=ngUlZ6aeFCaJixMTOuBlGQ5sgGnIKLlXl3GL0UJGj+V tyf3Ce/zXZNjk8McuHp3y8GJn/X6tMxgDuKvyJFUIZPVOS2tEE6XapaoDwgKuTYB DZ1CfeSrWuaL41iRrt5jpokRYHSOeM928VFncD5VqcWjPJ0jLOEWpCcqdU6TyYNg = Received: (qmail 39390 invoked by alias); 19 Dec 2017 14:00:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 37717 invoked by uid 89); 19 Dec 2017 14:00:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=accompany X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Dec 2017 14:00:01 +0000 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vBJDxdjT009752 for ; Tue, 19 Dec 2017 09:00:00 -0500 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ey2t1mrac-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 19 Dec 2017 08:59:58 -0500 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 19 Dec 2017 08:59:52 -0500 Received: from b01cxnp22034.gho.pok.ibm.com (9.57.198.24) by e14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 19 Dec 2017 08:59:49 -0500 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vBJDxnqM52625426; Tue, 19 Dec 2017 13:59:49 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 14180B2046; Tue, 19 Dec 2017 08:56:54 -0500 (EST) Received: from BigMac.local (unknown [9.85.182.54]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP id AA864B204D; Tue, 19 Dec 2017 08:56:53 -0500 (EST) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , cel@linux.vnet.ibm.com From: Bill Schmidt Subject: [PATCH, rs6000] Don't optimize swaps when a swap has mixed use Date: Tue, 19 Dec 2017 07:59:48 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17121913-0052-0000-0000-0000029421E8 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008227; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00962504; UDB=6.00486850; IPR=6.00742499; BA=6.00005752; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018624; XFM=3.00000015; UTC=2017-12-19 13:59:51 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17121913-0053-0000-0000-000052F4A057 Message-Id: <2a44ada4-e1d0-3b39-7a88-7f82f55f7066@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-12-19_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712190202 X-IsSubscribed: yes Hi, Carl Love is working on a patch to add missing flavors of the vec_xst_be intrinsic and test cases to cover all flavors. He ran into a latent bug in swap optimization that this patch addresses. Swap optimization operates on the principle that a computation can have swaps removed if all permuting loads are accompanied by a swap, all permuting stores are accompanied by a swap, and the remaining vector computations are lane-insensitive or easy to adjust if lanes are swapped across doublewords. A new problem that arises with vec_xl_be and vec_xst_be is that the same swap may accompany both a load and a store, so that removing that swap changes the semantics of the program. Suppose we have a vec_xl from *(a+b) followed by a vec_xst_be to *(c+d). The code at expand time then looks like: lxvd2x x,a,b xxswapd x,x stxvd2x x,c,d The first two instructions are generated by vec_xl, while the last is generated by vec_xst_be. Swap optimization removes the xxswapd because this sequence satisfies the rules, but now we have the same result as if the vec_xst_be were actually a vec_xst. To avoid this, this patch marks a computation as unoptimizable if it contains a swap that is both fed by a permuting load and feeds into a permuting store. Bootstrapped and tested on powerpc64le-unknown-linux-gnu for POWER8 with no regressions. Carl has verified this fixes the related problems in his test cases under development. Is this okay for trunk? Thanks, Bill 2017-12-19 Bill Schmidt * config/rs6000/rs6000-p8swap.c (swap_feeds_both_load_and_store): New function. (rs6000_analyze_swaps): Mark a web unoptimizable if it contains a swap associated with both a load and a store. Index: gcc/config/rs6000/rs6000-p8swap.c =================================================================== --- gcc/config/rs6000/rs6000-p8swap.c (revision 255801) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -327,6 +327,38 @@ insn_is_swap_p (rtx insn) return 1; } +/* Return 1 iff UID, known to reference a swap, is both fed by a load + and a feeder of a store. */ +static unsigned int +swap_feeds_both_load_and_store (swap_web_entry *insn_entry) +{ + rtx insn = insn_entry->insn; + struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); + df_ref def, use; + struct df_link *link = 0; + rtx_insn *load = 0, *store = 0; + unsigned int fed_by_load = 0; + unsigned int feeds_store = 0; + + FOR_EACH_INSN_INFO_USE (use, insn_info) + { + link = DF_REF_CHAIN (use); + load = DF_REF_INSN (link->ref); + if (insn_is_load_p (load) && insn_is_swap_p (load)) + fed_by_load = 1; + } + + FOR_EACH_INSN_INFO_DEF (def, insn_info) + { + link = DF_REF_CHAIN (def); + store = DF_REF_INSN (link->ref); + if (insn_is_store_p (store) && insn_is_swap_p (store)) + feeds_store = 1; + } + + return fed_by_load & feeds_store; +} + /* Return TRUE if insn is a swap fed by a load from the constant pool. */ static bool const_load_sequence_p (swap_web_entry *insn_entry, rtx insn) @@ -2029,6 +2061,14 @@ rs6000_analyze_swaps (function *fun) && !insn_entry[i].is_swap && !insn_entry[i].is_swappable) root->web_not_optimizable = 1; + /* If we have a swap that is both fed by a permuting load + and a feeder of a permuting store, then the optimization + isn't appropriate. (Consider vec_xl followed by vec_xst_be.) */ + else if (insn_entry[i].is_swap && !insn_entry[i].is_load + && !insn_entry[i].is_store + && swap_feeds_both_load_and_store (&insn_entry[i])) + root->web_not_optimizable = 1; + /* If we have permuting loads or stores that are not accompanied by a register swap, the optimization isn't appropriate. */ else if (insn_entry[i].is_load && insn_entry[i].is_swap)