From patchwork Thu Jun 25 11:51:54 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 488420 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 996B41402E8 for ; Thu, 25 Jun 2015 21:52:07 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=VuiyN1p/; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=kj1L7FFEHJhUYo2Lh5K9LkmIfpQcjTts8NPjRpuiDX7X6rvQKCjQ4 3WNOLIsy18WeVKG5nbQS5miAe0AIFteRHXzcyWWfAiMDf3gBdZmCbfYRQ8z7j/6x TWCoPDhWxk+eaqMbXSpXvaPf0nFqIFcxBpbr5vA+LhuDLC3bkk0EpQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=b2nJ3k42yqgk8pY2IPGtVD28WR0=; b=VuiyN1p/CfXzr/qj+FMzrSFIl4RT ymx4XeciXEvvX6JfmzyqJ40gc8SeU5EirzKpgbr9uo9nyTgtD+FbrGQeWCrhKcLp MPBXRQo/4q+UfEar9nsGcz8Iq9bljqcFMtPCPdBBejW9VYrCs+HbgVYABab8LxDv 7zn9FUCA5bG8lSw= Received: (qmail 69712 invoked by alias); 25 Jun 2015 11:52:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 69701 invoked by uid 89); 25 Jun 2015 11:51:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL, BAYES_40, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 25 Jun 2015 11:51:57 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-25-i4UMXhRcR0yqiEmWoUaxeg-1 Received: from localhost ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Jun 2015 12:51:54 +0100 From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener , GCC Patches , richard.sandiford@arm.com Cc: GCC Patches Subject: Re: Remove redundant AND from count reduction loop References: <87pp4m8mkp.fsf@e105548-lin.cambridge.arm.com> <87egl1sa2p.fsf@e105548-lin.cambridge.arm.com> <87a8vps6p1.fsf@e105548-lin.cambridge.arm.com> <871th1s322.fsf@e105548-lin.cambridge.arm.com> <87twtxqlbq.fsf@e105548-lin.cambridge.arm.com> <87pp4kqk4l.fsf@e105548-lin.cambridge.arm.com> Date: Thu, 25 Jun 2015 12:51:54 +0100 In-Reply-To: (Richard Biener's message of "Thu, 25 Jun 2015 11:05:05 +0100") Message-ID: <87lhf8qa3p.fsf@e105548-lin.cambridge.arm.com> User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-MC-Unique: i4UMXhRcR0yqiEmWoUaxeg-1 Richard Biener writes: > On Thu, Jun 25, 2015 at 10:15 AM, Richard Sandiford >> Index: gcc/match.pd >> =================================================================== >> --- gcc/match.pd 2015-06-24 20:24:31.344998571 +0100 >> +++ gcc/match.pd 2015-06-24 20:24:31.340998617 +0100 >> @@ -1014,6 +1014,26 @@ along with GCC; see the file COPYING3. >> (cnd (logical_inverted_value truth_valued_p@0) @1 @2) >> (cnd @0 @2 @1))) >> >> +/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C), since vector comparisons >> + return all-1 or all-0 results. */ >> +/* ??? We could instead convert all instances of the vec_cond to negate, >> + but that isn't necessarily a win on its own. */ >> +(simplify >> + (plus:c @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2))) >> + (if (VECTOR_TYPE_P (type) >> + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)) >> + && (TYPE_MODE (TREE_TYPE (type)) >> + == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0))))) >> + (minus @3 (view_convert @0)))) >> + >> +/* ... likewise A - (B vcmp C ? 1 : 0) -> A + (B vcmp C). */ >> +(simplify >> + (minus @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2))) >> + (if (VECTOR_TYPE_P (type) >> + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)) >> + && (TYPE_PRECISION (TREE_TYPE (type)) >> + == TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@0))))) > > Either TYPE_PRECISION or TYPE_MODE please ;) Bah. The main reason I hate cut-&-paste is that I'm so hopeless at it. > I think that TYPE_MODE is more correct if you consider (minus V4SF > (view_convert:V4SF (vec_cond V4SI V4SI V4SI)) where you would end up > with a non-sensical TYPE_PRECISION query on V4SF. So probably > VECTOR_INTEGER_TYPE_P again, then TYPE_PRECISION is good. Actually, they were both meant to be TYPE_MODE, as below. Is this OK? Thanks, Richard gcc/ * match.pd: Add patterns for vec_conds between 1 and 0. gcc/testsuite/ * gcc.target/aarch64/vect-add-sub-cond.c: New test. Index: gcc/match.pd =================================================================== --- gcc/match.pd 2015-06-25 11:06:50.462827031 +0100 +++ gcc/match.pd 2015-06-25 11:07:23.742445798 +0100 @@ -1014,6 +1014,26 @@ along with GCC; see the file COPYING3. (cnd (logical_inverted_value truth_valued_p@0) @1 @2) (cnd @0 @2 @1))) +/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C), since vector comparisons + return all-1 or all-0 results. */ +/* ??? We could instead convert all instances of the vec_cond to negate, + but that isn't necessarily a win on its own. */ +(simplify + (plus:c @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2))) + (if (VECTOR_TYPE_P (type) + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)) + && (TYPE_MODE (TREE_TYPE (type)) + == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0))))) + (minus @3 (view_convert @0)))) + +/* ... likewise A - (B vcmp C ? 1 : 0) -> A + (B vcmp C). */ +(simplify + (minus @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2))) + (if (VECTOR_TYPE_P (type) + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)) + && (TYPE_MODE (TREE_TYPE (type)) + == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0))))) + (plus @3 (view_convert @0)))) /* Simplifications of comparisons. */ Index: gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c =================================================================== --- /dev/null 2015-06-02 17:27:28.541944012 +0100 +++ gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c 2015-06-25 11:06:50.458827055 +0100 @@ -0,0 +1,94 @@ +/* Make sure that vector comaprison results are not unnecessarily ANDed + with vectors of 1. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define COUNT1(X) if (X) count += 1 +#define COUNT2(X) if (X) count -= 1 +#define COUNT3(X) count += (X) +#define COUNT4(X) count -= (X) + +#define COND1(X) (X) +#define COND2(X) ((X) ? 1 : 0) +#define COND3(X) ((X) ? -1 : 0) +#define COND4(X) ((X) ? 0 : 1) +#define COND5(X) ((X) ? 0 : -1) + +#define TEST_LT(X, Y) ((X) < (Y)) +#define TEST_LE(X, Y) ((X) <= (Y)) +#define TEST_GT(X, Y) ((X) > (Y)) +#define TEST_GE(X, Y) ((X) >= (Y)) +#define TEST_EQ(X, Y) ((X) == (Y)) +#define TEST_NE(X, Y) ((X) != (Y)) + +#define COUNT_LOOP(ID, TYPE, CMP_ARRAY, TEST, COUNT) \ + TYPE \ + reduc_##ID (__typeof__ (CMP_ARRAY[0]) x) \ + { \ + TYPE count = 0; \ + for (unsigned int i = 0; i < 1024; ++i) \ + COUNT (TEST (CMP_ARRAY[i], x)); \ + return count; \ + } + +#define COND_LOOP(ID, ARRAY, CMP_ARRAY, TEST, COND) \ + void \ + plus_##ID (__typeof__ (CMP_ARRAY[0]) x) \ + { \ + for (unsigned int i = 0; i < 1024; ++i) \ + ARRAY[i] += COND (TEST (CMP_ARRAY[i], x)); \ + } \ + void \ + plusc_##ID (void) \ + { \ + for (unsigned int i = 0; i < 1024; ++i) \ + ARRAY[i] += COND (TEST (CMP_ARRAY[i], 10)); \ + } \ + void \ + minus_##ID (__typeof__ (CMP_ARRAY[0]) x) \ + { \ + for (unsigned int i = 0; i < 1024; ++i) \ + ARRAY[i] -= COND (TEST (CMP_ARRAY[i], x)); \ + } \ + void \ + minusc_##ID (void) \ + { \ + for (unsigned int i = 0; i < 1024; ++i) \ + ARRAY[i] += COND (TEST (CMP_ARRAY[i], 1)); \ + } + +#define ALL_LOOPS(ID, ARRAY, CMP_ARRAY, TEST) \ + typedef __typeof__(ARRAY[0]) ID##_type; \ + COUNT_LOOP (ID##_1, ID##_type, CMP_ARRAY, TEST, COUNT1) \ + COUNT_LOOP (ID##_2, ID##_type, CMP_ARRAY, TEST, COUNT2) \ + COUNT_LOOP (ID##_3, ID##_type, CMP_ARRAY, TEST, COUNT3) \ + COUNT_LOOP (ID##_4, ID##_type, CMP_ARRAY, TEST, COUNT4) \ + COND_LOOP (ID##_1, ARRAY, CMP_ARRAY, TEST, COND1) \ + COND_LOOP (ID##_2, ARRAY, CMP_ARRAY, TEST, COND2) \ + COND_LOOP (ID##_3, ARRAY, CMP_ARRAY, TEST, COND3) \ + COND_LOOP (ID##_4, ARRAY, CMP_ARRAY, TEST, COND4) \ + COND_LOOP (ID##_5, ARRAY, CMP_ARRAY, TEST, COND5) + +signed int asi[1024] __attribute__ ((aligned (16))); +unsigned int aui[1024] __attribute__ ((aligned (16))); +signed long long asl[1024] __attribute__ ((aligned (16))); +unsigned long long aul[1024] __attribute__ ((aligned (16))); +float af[1024] __attribute__ ((aligned (16))); +double ad[1024] __attribute__ ((aligned (16))); + +ALL_LOOPS (si_si, aui, asi, TEST_LT) +ALL_LOOPS (ui_si, aui, asi, TEST_LE) +ALL_LOOPS (si_ui, aui, asi, TEST_GT) +ALL_LOOPS (ui_ui, aui, asi, TEST_GE) +ALL_LOOPS (sl_sl, asl, asl, TEST_NE) +ALL_LOOPS (ul_ul, aul, aul, TEST_EQ) +ALL_LOOPS (si_f, asi, af, TEST_LE) +ALL_LOOPS (ui_f, aui, af, TEST_GT) +ALL_LOOPS (sl_d, asl, ad, TEST_GE) +ALL_LOOPS (ul_d, aul, ad, TEST_GT) + +/* { dg-final { scan-assembler-not "\tand\t" } } */ +/* { dg-final { scan-assembler-not "\tld\[^\t\]*\t\[wx\]" } } */ +/* { dg-final { scan-assembler-not "\tst\[^\t\]*\t\[wx\]" } } */ +/* { dg-final { scan-assembler "\tldr\tq" } } */ +/* { dg-final { scan-assembler "\tstr\tq" } } */