From patchwork Wed Jun 15 08:52:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 635761 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rV0dP6BH4z9t1f for ; Wed, 15 Jun 2016 18:52:25 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=yFyrN0no; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=default; b=do01CmTZDl8UNfg/9W/Q+4mVGS4Bh rEq7WhT8x0gSMOmDp+SQrZitP7NVivucUI5+aNyDVM9FD2y/nA1mA/IxYCmPWr0e oxHma1oSms1W0XaKwYYhavyfo8+fwNLJjF7FcheNsKo5VLJSHZ7cudJO7dtUuq43 b5zqyW18bWbPfk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; s=default; bh=9Is2AlE15VlmefR1XfvsGD0gnqM=; b=yFy rN0noQuRQcON6A8zTDZKreMF2mjczH4xkBESXt3cpECl9hD645QNb5pzRFxju+TM lioWRjgdeEhhvNfJR1WR3FAKsKPW6sXGb44nwAfYRmBXud6CM005MAq2d4Ldk4Xs 4roe3pjF4NzLFWlZVqYbU45oibiAPIoEzn/VwvQk= Received: (qmail 104691 invoked by alias); 15 Jun 2016 08:52:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 104575 invoked by uid 89); 15 Jun 2016 08:52:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=blow X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 15 Jun 2016 08:52:10 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E488EF for ; Wed, 15 Jun 2016 01:52:49 -0700 (PDT) Received: from localhost (e105548-lin.manchester.arm.com [10.45.32.67]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 648713F213 for ; Wed, 15 Jun 2016 01:52:08 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [5/7] Move the fix for PR65518 References: <87d1nin8hz.fsf@e105548-lin.cambridge.arm.com> Date: Wed, 15 Jun 2016 09:52:06 +0100 In-Reply-To: <87d1nin8hz.fsf@e105548-lin.cambridge.arm.com> (Richard Sandiford's message of "Wed, 15 Jun 2016 09:47:52 +0100") Message-ID: <87poriltqh.fsf@e105548-lin.cambridge.arm.com> User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 This patch moves the fix for PR65518 to the code that checks whether load-and-permute operations are supported. If the group size is greater than the vectorisation factor, it would still be possible to fall back to elementwise loads (as for strided groups) rather than fail vectorisation entirely. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * tree-vectorizer.h (vect_grouped_load_supported): Add a single_element_p parameter. * tree-vect-data-refs.c (vect_grouped_load_supported): Likewise. Check the PR65518 case here rather than in vectorizable_load. * tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly. * tree-vect-stmts.c (vectorizable_load): Likewise. Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h +++ gcc/tree-vectorizer.h @@ -1069,7 +1069,7 @@ extern tree bump_vector_ptr (tree, gimple *, gimple_stmt_iterator *, gimple *, extern tree vect_create_destination_var (tree, tree); extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT); extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT); -extern bool vect_grouped_load_supported (tree, unsigned HOST_WIDE_INT); +extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT); extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT); extern void vect_permute_store_chain (vec ,unsigned int, gimple *, gimple_stmt_iterator *, vec *); Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c +++ gcc/tree-vect-data-refs.c @@ -5131,14 +5131,31 @@ vect_setup_realignment (gimple *stmt, gimple_stmt_iterator *gsi, /* Function vect_grouped_load_supported. - Returns TRUE if even and odd permutations are supported, - and FALSE otherwise. */ + COUNT is the size of the load group (the number of statements plus the + number of gaps). SINGLE_ELEMENT_P is true if there is actually + only one statement, with a gap of COUNT - 1. + + Returns true if a suitable permute exists. */ bool -vect_grouped_load_supported (tree vectype, unsigned HOST_WIDE_INT count) +vect_grouped_load_supported (tree vectype, bool single_element_p, + unsigned HOST_WIDE_INT count) { machine_mode mode = TYPE_MODE (vectype); + /* If this is single-element interleaving with an element distance + that leaves unused vector loads around punt - we at least create + very sub-optimal code in that case (and blow up memory, + see PR65518). */ + if (single_element_p && count > TYPE_VECTOR_SUBPARTS (vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "single-element interleaving not supported " + "for not adjacent vector loads\n"); + return false; + } + /* vect_permute_load_chain requires the group size to be equal to 3 or be a power of two. */ if (count != 3 && exact_log2 (count) == -1) Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c +++ gcc/tree-vect-loop.c @@ -2148,10 +2148,12 @@ again: { vinfo = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]); vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo)); + bool single_element_p = !STMT_VINFO_GROUP_NEXT_ELEMENT (vinfo); size = STMT_VINFO_GROUP_SIZE (vinfo); vectype = STMT_VINFO_VECTYPE (vinfo); if (! vect_load_lanes_supported (vectype, size) - && ! vect_grouped_load_supported (vectype, size)) + && ! vect_grouped_load_supported (vectype, single_element_p, + size)) return false; } } Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c +++ gcc/tree-vect-stmts.c @@ -6298,31 +6298,20 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, first_stmt = GROUP_FIRST_ELEMENT (stmt_info); group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt)); + bool single_element_p = (first_stmt == stmt + && !GROUP_NEXT_ELEMENT (stmt_info)); if (!slp && !STMT_VINFO_STRIDED_P (stmt_info)) { if (vect_load_lanes_supported (vectype, group_size)) load_lanes_p = true; - else if (!vect_grouped_load_supported (vectype, group_size)) + else if (!vect_grouped_load_supported (vectype, single_element_p, + group_size)) return false; } - /* If this is single-element interleaving with an element distance - that leaves unused vector loads around punt - we at least create - very sub-optimal code in that case (and blow up memory, - see PR65518). */ - if (first_stmt == stmt - && !GROUP_NEXT_ELEMENT (stmt_info)) + if (single_element_p) { - if (GROUP_SIZE (stmt_info) > TYPE_VECTOR_SUBPARTS (vectype)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "single-element interleaving not supported " - "for not adjacent vector loads\n"); - return false; - } - /* Single-element interleaving requires peeling for gaps. */ gcc_assert (GROUP_GAP (stmt_info)); }