From patchwork Tue Nov 15 20:03:49 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 695247 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tJJM22mDwz9t0m for ; Wed, 16 Nov 2016 07:06:54 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="DC1I03Mm"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=xN6Db BWdsIyJg+KksVO2zkRukp9jUisgp7IXcLj5hXon/Nbyezcf0EzynhwAWGowL3ozT UuskneJUbsJ3Wl25wHhuQGBrEWRa/BUpL6bI7Gx1ybKCzdTrOL9Z3+dx8m1HkiLK SXZERXvPyApmvldCFjSnNFPg8kQdN0D1b5w40s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=9FGXCEAQCtP eOfEFWYFOpYmf7+U=; b=DC1I03MmWy8UTqKZSTET1bqzzBgyif73LDEMOxUaIye f+yuT/AWUrVObBWWpayG7q2+NbgtK+UwQxRsZY87jMyJPaCfGZiiZRJJtipI+/Oo KD74X7YX5ZKEJnG7tQY/fwU47ws8JNTk64cAHNSkmBqpZSMEoNRsQa9O75DsF5kE = Received: (qmail 55115 invoked by alias); 15 Nov 2016 20:04:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 54931 invoked by uid 89); 15 Nov 2016 20:04:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=single_succ_p, disturbs, 1m, 1n X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 15 Nov 2016 20:03:57 +0000 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uAFK3p9I052371 for ; Tue, 15 Nov 2016 15:03:55 -0500 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0b-001b2d01.pphosted.com with ESMTP id 26r2cgunfc-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 15 Nov 2016 15:03:55 -0500 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 15 Nov 2016 13:03:55 -0700 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 15 Nov 2016 13:03:52 -0700 Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id EBAAC19D8042; Tue, 15 Nov 2016 13:03:13 -0700 (MST) Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uAFK3Kkn38142168; Tue, 15 Nov 2016 20:03:50 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 896BF112034; Tue, 15 Nov 2016 15:03:50 -0500 (EST) Received: from BigMac.local (unknown [9.85.138.244]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id 45B65112047; Tue, 15 Nov 2016 15:03:50 -0500 (EST) To: GCC Patches , richard.guenther@gmail.com From: Bill Schmidt Subject: [PATCH] Fix PR77848 Date: Tue, 15 Nov 2016 14:03:49 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111520-0028-0000-0000-0000060AAB0B X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006083; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00781129; UDB=6.00376761; IPR=6.00558637; BA=6.00004883; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013337; XFM=3.00000011; UTC=2016-11-15 20:03:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111520-0029-0000-0000-000030DF17C1 Message-Id: <1d9bddb6-2005-a53b-fbcf-46c9dfe244db@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-15_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611150338 X-IsSubscribed: yes Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848 identifies a situation where if-conversion causes degradation when the if-converted loop is not subsequently vectorized. The if-conversion pass does not have a cost model to avoid such degradations. However, it does have a capability to version the if-converted loop, so that the vectorizer can choose the if-converted version if vectorization occurs, or the unmodified version if vectorization does not occur. Currently versioning is only done under special circumstances. This patch does two things: It requires loop versioning whenever loop vectorization is enabled so that such degradations can't occur; and it extends loop versioning to outer loops when such loops are of the right form for outer loop vectorization. The latter is needed to avoid introducing degradations with versioning of inner loops, which disturbs the pattern that outer loop vectorization expects. This is an embarrassingly simple patch, given how much time I spent going down other paths. The most surprising thing is that versioning the outer loop doesn't require any additional handshaking with the vectorizer. It just works. I've verified this on some examples, and we end up with the correct vectorization and with the unused loop nest discarded. The one remaining problem with this bug is that it precludes SLP from seeing if-converted loops to work on. With this patch, if the vectorizer can't vectorize an if-converted loop, the original version survives. We have one test case that fails when that happens, because it expected to do SLP vectorization on the if-converted statements: > FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump-times slp1 "basic block vectorized" 1 > FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block vectorized" 1 Arguably, this shows a deficiency in SLP vectorization, since it won't see if-converted statements in non-loop code in any event. Eventually SLP should learn to handle these kinds of PHI statements itself. Bootstrapped and tested on powerpc64le-unknown-linux-gnu, with only the specified regression. Is this ok for trunk? Thanks, Bill [gcc] 2016-11-15 Bill Schmidt PR tree-optimization/77848 * tree-if-conv.c (version_loop_for_if_conversion): When versioning an outer loop, only save basic block aux information for the inner loop. (versionable_outer_loop_p): New function. (tree_if_conversion): Always version a loop when vectorization is enabled; version the outer loop instead of the inner one if the pattern will be recognized for outer-loop vectorization. [gcc/testsuite] 2016-11-15 Bill Schmidt PR tree-optimization/77848 * gfortran.dg/vect/pr78848.f: New test. Index: gcc/testsuite/gfortran.dg/vect/pr77848.f =================================================================== --- gcc/testsuite/gfortran.dg/vect/pr77848.f (revision 0) +++ gcc/testsuite/gfortran.dg/vect/pr77848.f (working copy) @@ -0,0 +1,24 @@ +! PR 77848: Verify versioning is on when vectorization fails +! { dg-do compile } +! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" } + + subroutine sub(x,a,n,m) + implicit none + real*8 x(*),a(*),atemp + integer i,j,k,m,n + real*8 s,t,u,v + do j=1,m + atemp=0.d0 + do i=1,n + if (abs(a(i)).gt.atemp) then + atemp=a(i) + k = i + end if + enddo + call dummy(atemp,k) + enddo + return + end + +! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } } +! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } } Index: gcc/tree-if-conv.c =================================================================== --- gcc/tree-if-conv.c (revision 242412) +++ gcc/tree-if-conv.c (working copy) @@ -2533,6 +2533,7 @@ version_loop_for_if_conversion (struct loop *loop) struct loop *new_loop; gimple *g; gimple_stmt_iterator gsi; + unsigned int save_length; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), @@ -2540,8 +2541,9 @@ version_loop_for_if_conversion (struct loop *loop) gimple_call_set_lhs (g, cond); /* Save BB->aux around loop_version as that uses the same field. */ - void **saved_preds = XALLOCAVEC (void *, loop->num_nodes); - for (unsigned i = 0; i < loop->num_nodes; i++) + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + void **saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) saved_preds[i] = ifc_bbs[i]->aux; initialize_original_copy_tables (); @@ -2550,7 +2552,7 @@ version_loop_for_if_conversion (struct loop *loop) REG_BR_PROB_BASE, true); free_original_copy_tables (); - for (unsigned i = 0; i < loop->num_nodes; i++) + for (unsigned i = 0; i < save_length; i++) ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) @@ -2565,6 +2567,40 @@ version_loop_for_if_conversion (struct loop *loop) return true; } +/* Return true when LOOP satisfies the follow conditions that will + allow it to be recognized by the vectorizer for outer-loop + vectorization: + - The loop has exactly one inner loop. + - The loop has a single exit. + - The loop header has a single successor, which is the inner + loop header. + - The loop exit block has a single predecessor, which is the + inner loop's exit block. */ + +static bool +versionable_outer_loop_p (struct loop *loop) +{ + if (!loop->inner + || loop->inner->next + || !single_exit (loop) + || !single_succ_p (loop->header) + || single_succ (loop->header) != loop->inner->header) + return false; + + gcc_assert (single_pred_p (loop->latch)); + basic_block outer_exit = single_pred (loop->latch); + gcc_assert (single_pred_p (loop->inner->latch)); + basic_block inner_exit = single_pred (loop->inner->latch); + + if (!single_pred_p (outer_exit) || single_pred (outer_exit) != inner_exit) + return false; + + if (dump_file) + fprintf (dump_file, "Found versionable outer loop\n"); + + return true; +} + /* Performs splitting of critical edges. Skip splitting and return false if LOOP will not be converted because: @@ -2767,8 +2803,16 @@ tree_if_conversion (struct loop *loop) || loop->dont_vectorize)) goto cleanup; - if ((any_pred_load_store || any_complicated_phi) - && !version_loop_for_if_conversion (loop)) + /* Since we have no cost model, always version loops if vectorization + is enabled. Either version this loop, or if the pattern is right + for outer-loop vectorization, version the outer loop. In the + latter case we will still if-convert the original inner loop. */ + /* FIXME: When SLP vectorization can handle if-conversion on its own, + predicate all of if-conversion on flag_tree_loop_vectorize. */ + if ((any_pred_load_store || any_complicated_phi || flag_tree_loop_vectorize) + && !version_loop_for_if_conversion + (versionable_outer_loop_p (loop_outer (loop)) + ? loop_outer (loop) : loop)) goto cleanup; /* Now all statements are if-convertible. Combine all the basic