From patchwork Sun Oct 5 22:06:03 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 396711 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id DB39A140111 for ; Mon, 6 Oct 2014 09:06:18 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; q=dns; s=default; b=uAiDSaYUncd5uBa9 OA8PAW/yoON/7y5/h+NUXvzWHh6syVGrjkkYE1aykbc1pVC9uE7m2Y6k6nSGIKOd Q+0PRl86r6T/5aGFyjQXLCc4wWIuSFqKWg3LjJD/BU1kWxDCEK4Y+uUJPihAa14p 82kXfV4mU5VjLD3b+TANSpQOXhg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; s=default; bh=8RXXNKLW8l8CYzkl9kRleC dLJTQ=; b=w5pgyfeheK8YL+9bWRHVB5qgrwQtmPwsZQWGMmTAdSXhMNDB6zxLfu shr318H4m9EJnkiAWpXE5XqSV0Ac/Z1wUoL8ULpKSRHZ5sSNatXVJ6D63JfCYbgT NXrpcSnHVY2JdM7sSVlv5MLDior9MPq9oEVdwxEYwQ0W9E3FfhE6k= Received: (qmail 24378 invoked by alias); 5 Oct 2014 22:06:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24369 invoked by uid 89); 5 Oct 2014 22:06:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e33.co.us.ibm.com Received: from e33.co.us.ibm.com (HELO e33.co.us.ibm.com) (32.97.110.151) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Sun, 05 Oct 2014 22:06:10 +0000 Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 5 Oct 2014 16:06:08 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sun, 5 Oct 2014 16:06:05 -0600 Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 7553D3E4003D for ; Sun, 5 Oct 2014 16:06:05 -0600 (MDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s95M65t549545304 for ; Mon, 6 Oct 2014 00:06:05 +0200 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s95M651A029464 for ; Sun, 5 Oct 2014 16:06:05 -0600 Received: from [9.80.9.80] ([9.80.9.80]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s95M63W1029381; Sun, 5 Oct 2014 16:06:04 -0600 Message-ID: <1412546763.2986.126.camel@gnopaine> Subject: [PATCH, rs6000] Document issues with permutes for analyze_swaps From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Sun, 05 Oct 2014 17:06:03 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14100522-0928-0000-0000-0000055E008B X-IsSubscribed: yes Hi, I spent some time thinking about handling vperm instructions in the analyze_swaps pass, and convinced myself that it isn't necessarily wise to do so. At the least it will require adding a cost model to the pass to determine whether a computation involving permutes should be optimized. At this time I don't intend to implement this, but I want to record the information about how it could be done should it be deemed necessary. So this patch just adds a few paragraphs of documentation about the issue. No change in behavior intended. I've ensured that rs6000.c still compiles successfully on powerpc64le-unknown-linux-gnu. Is this ok for trunk? Thanks, Bill 2014-10-05 Bill Schmidt * config/rs6000/rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 215907) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem) than deleting a swap, we convert the load/store into a permuting load/store (which effectively removes the swap). */ +/* Notes on Permutes + + We do not currently handle computations that contain permutes. There + is a general transformation that can be performed correctly, but it + may introduce more expensive code than it replaces. To handle these + would require a cost model to determine when to perform the optimization. + This commentary records how this could be done if desired. + + The most general permute is something like this (example for V16QI): + + (vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI)) + (parallel [(const_int a0) (const_int a1) + ... + (const_int a14) (const_int a15)])) + + where a0,...,a15 are in [0,31] and select elements from op1 and op2 + to produce in the result. + + Regardless of mode, we can convert the PARALLEL to a mask of 16 + byte-element selectors. Let's call this M, with M[i] representing + the ith byte-element selector value. Then if we swap doublewords + throughout the computation, we can get correct behavior by replacing + M with M' as follows: + + { M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23] + M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31] + { M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23] + { M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31] + + This seems promising at first, since we are just replacing one mask + with another. But certain masks are preferable to others. If M + is a mask that matches a vmrghh pattern, for example, M' certainly + will not. Instead of a single vmrghh, we would generate a load of + M' and a vperm. So we would need to know how many xxswapd's we can + remove as a result of this transformation to determine if it's + profitable; and preferably the logic would need to be aware of all + the special preferable masks. + + Another form of permute is an UNSPEC_VPERM, in which the mask is + already in a register. In some cases, this mask may be a constant + that we can discover with ud-chains, in which case the above + transformation is ok. However, the common usage here is for the + mask to be produced by an UNSPEC_LVSL, in which case the mask + cannot be known at compile time. In such a case we would have to + generate several instructions to compute M' as above at run time, + and a cost model is needed again. */ + /* This is based on the union-find logic in web.c. web_entry_base is defined in df.h. */ class swap_web_entry : public web_entry_base