From patchwork Wed May 22 02:04:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kugan Vivekanandarajah X-Patchwork-Id: 1103091 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-501400-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="hHMEUDNA"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="knRGTout"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 457x0W0678z9s9T for ; Wed, 22 May 2019 12:09:46 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=ejRp8hJdAH0aUuM6dWK7MJ5h7aSW3XM+7i3m1HXPhCi2vE 7cC0pBb1ByJTd2e0I479FtecklHM9VzImDxhLGjPlqGDtGIp0pdHVQfQBYoxzipr XSh15vGJbcluQO1v4Eo0PgmLK2j0Wk56PA6Kx/s9I96YVLjJ9RDhRX+LHCUtY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=SfMV3d7EF3vQaexIXjqd2gyzfgc=; b=hHMEUDNACU8CzguM2u0F /h3k0U/tleeT8HvqPM8qjnNjgr6wuPAyRyCU81AdoF+McMwAg63xTrKlK8pcTdye pEz+Ola8lqCfbqgUq559VrTbkTrLaBDZ/+yQ7vLcJ15lzJ803uAscugdJPZ6+ny6 FuVfQjI5l12xCDPcpXho8KA= Received: (qmail 120026 invoked by alias); 22 May 2019 02:09:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 120008 invoked by uid 89); 22 May 2019 02:09:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=mar, *loc, reusable, integer_cst X-HELO: mail-lf1-f42.google.com Received: from mail-lf1-f42.google.com (HELO mail-lf1-f42.google.com) (209.85.167.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 May 2019 02:09:34 +0000 Received: by mail-lf1-f42.google.com with SMTP id n134so397846lfn.11 for ; Tue, 21 May 2019 19:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=P+b9991pk1m6v2jG5F+3+Hod/jxUCzg5TzuODkwsH7U=; b=knRGToutRgmVJhJqekqC6zS55U8syEH4L20YuQixw8OHSz/hcjkmvg4gmafjmWTKtJ zPOEOeD0idIc7kxz52mOqAna7CkiXL4qro4/mwFLdOxn1a4FqvtqvjWApg9Lv4vmK9PW 7iKNvXK8BOIIc74g4B42vwcqcpXRRZlg3xptueGMq6PbxN9ybk49s4Q8bGqgIxvF6I09 joNJNF9EKpUUp+7l459pJYxXjmRVs73Vo0rpldxThCeKoh7fi2CU8E/tqihum4uRTxdA 6UalHo+hfHgdajzlxqjuvdaFUeSskBR8Tz0RvKWSbTU4gPv580nfelYaQhew5Zf8KEa6 K2fg== MIME-Version: 1.0 From: Kugan Vivekanandarajah Date: Wed, 22 May 2019 12:04:37 +1000 Message-ID: Subject: [RFC][PR88838][SVE] Use 32-bit WHILELO in LP64 mode To: GCC Patches X-IsSubscribed: yes Hi, Attached RFC patch attempts to use 32-bit WHILELO in LP64 mode to fix the PR. Bootstarp and regression testing ongoing. In earlier testing, I ran into an issue related to fwprop. I will tackle that based on the feedback for the patch. Thanks, Kugan From 4e9837ff9c0c080923f342e83574a6fdba2b3d92 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Tue, 5 Mar 2019 10:01:45 +1100 Subject: [PATCH] pr88838[v2] As Mentioned in PR88838, this patch avoid the SXTW by using WHILELO on W registers instead of X registers. As mentined in PR, vect_verify_full_masking checks which IV widths are supported for WHILELO but prefers to go to Pmode width. This is because using Pmode allows ivopts to reuse the IV for indices (as in the loads and store above). However, it would be better to use a 32-bit WHILELO with a truncated 64-bit IV if: (a) the limit is extended from 32 bits. (b) the detection loop in vect_verify_full_masking detects that using a 32-bit IV would be correct. gcc/ChangeLog: 2019-05-22 Kugan Vivekanandarajah * tree-vect-loop-manip.c (vect_set_loop_masks_directly): If the compare_type is not with Pmode size, we will create an IV with Pmode size with truncated use (i.e. converted to the correct type). * tree-vect-loop.c (vect_verify_full_masking): Find which IV widths are supported for WHILELO. gcc/testsuite/ChangeLog: 2019-05-22 Kugan Vivekanandarajah * gcc.target/aarch64/pr88838.c: New test. * gcc.target/aarch64/sve/while_1.c: Adjust. Change-Id: Iff52946c28d468078f2cc0868d53edb05325b8ca --- gcc/fwprop.c | 13 +++++++ gcc/testsuite/gcc.target/aarch64/pr88838.c | 11 ++++++ gcc/testsuite/gcc.target/aarch64/sve/while_1.c | 16 ++++---- gcc/tree-vect-loop-manip.c | 52 ++++++++++++++++++++++++-- gcc/tree-vect-loop.c | 39 ++++++++++++++++++- 5 files changed, 117 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88838.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index cf2c9de..5275ad3 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -1358,6 +1358,19 @@ forward_propagate_and_simplify (df_ref use, rtx_insn *def_insn, rtx def_set) else mode = GET_MODE (*loc); + /* TODO. */ + if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (reg))) + return false; + /* TODO. We can't get the mode for + (set (reg:VNx16BI 109) + (unspec:VNx16BI [ + (reg:SI 131) + (reg:SI 106) + ] UNSPEC_WHILE_LO)) + Thus, bailout when it is UNSPEC and MODEs are not compatible. */ + if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (reg)) + && GET_CODE (SET_SRC (use_set)) == UNSPEC) + return false; new_rtx = propagate_rtx (*loc, mode, reg, src, optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_insn))); diff --git a/gcc/testsuite/gcc.target/aarch64/pr88838.c b/gcc/testsuite/gcc.target/aarch64/pr88838.c new file mode 100644 index 0000000..9d03c0a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr88838.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-S -O3 -march=arm8.2-a+sve" } */ + +void +f (int *restrict x, int *restrict y, int *restrict z, int n) +{ + for (int i = 0; i < n; i += 1) + x[i] = y[i] + z[i]; +} + +/* { dg-final { scan-assembler-not "sxtw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/while_1.c b/gcc/testsuite/gcc.target/aarch64/sve/while_1.c index a93a04b..05a4860 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/while_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/while_1.c @@ -26,14 +26,14 @@ TEST_ALL (ADD_LOOP) /* { dg-final { scan-assembler-not {\tuqdec} } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, xzr,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, x[0-9]+,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, xzr,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, x[0-9]+,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, xzr,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, x[0-9]+,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, xzr,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, x[0-9]+,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, wzr,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, w[0-9]+,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, wzr,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, w[0-9]+,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, wzr,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, w[0-9]+,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, wzr,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, w[0-9]+,} 3 } } */ /* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z, \[x0, x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7], \[x0, x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, \[x0, x[0-9]+, lsl 1\]\n} 2 } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 77d3dac..d6452a1 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -418,7 +418,20 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo, tree mask_type = rgm->mask_type; unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter; poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type); - + bool convert = false; + tree iv_type = NULL_TREE; + + /* If the compare_type is not with Pmode size, we will create an IV with + Pmode size with truncated use (i.e. converted to the correct type). + This is because using Pmode allows ivopts to reuse the IV for indices + (in the loads and store). */ + if (known_lt (GET_MODE_BITSIZE (TYPE_MODE (compare_type)), + GET_MODE_BITSIZE (Pmode))) + { + iv_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (Pmode), + true); + convert = true; + } /* Calculate the maximum number of scalar values that the rgroup handles in total, the number that it handles for each iteration of the vector loop, and the number that it should skip during the @@ -444,12 +457,43 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo, processed. */ tree index_before_incr, index_after_incr; gimple_stmt_iterator incr_gsi; + gimple_stmt_iterator incr_gsi2; bool insert_after; - tree zero_index = build_int_cst (compare_type, 0); + tree zero_index; standard_iv_increment_position (loop, &incr_gsi, &insert_after); - create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi, - insert_after, &index_before_incr, &index_after_incr); + if (convert) + { + /* If we are creating IV of Pmode type and converting. */ + zero_index = build_int_cst (iv_type, 0); + tree step = build_int_cst (iv_type, + LOOP_VINFO_VECT_FACTOR (loop_vinfo)); + /* Creating IV of Pmode type. */ + create_iv (zero_index, step, NULL_TREE, loop, &incr_gsi, + insert_after, &index_before_incr, &index_after_incr); + /* Create truncated index_before and after increament. */ + tree index_before_incr_trunc = make_ssa_name (compare_type); + tree index_after_incr_trunc = make_ssa_name (compare_type); + gimple *incr_before_stmt = gimple_build_assign (index_before_incr_trunc, + NOP_EXPR, + index_before_incr); + gimple *incr_after_stmt = gimple_build_assign (index_after_incr_trunc, + NOP_EXPR, + index_after_incr); + incr_gsi2 = incr_gsi; + gsi_insert_before (&incr_gsi2, incr_before_stmt, GSI_NEW_STMT); + gsi_insert_after (&incr_gsi, incr_after_stmt, GSI_NEW_STMT); + index_before_incr = index_before_incr_trunc; + index_after_incr = index_after_incr_trunc; + zero_index = build_int_cst (compare_type, 0); + } + else + { + /* If the IV is of Pmode compare_type, no convertion needed. */ + zero_index = build_int_cst (compare_type, 0); + create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi, + insert_after, &index_before_incr, &index_after_incr); + } tree test_index, test_limit, first_limit; gimple_stmt_iterator *test_gsi; if (might_wrap_p) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index bd81193..2769c86 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1035,6 +1035,30 @@ vect_verify_full_masking (loop_vec_info loop_vinfo) /* Find a scalar mode for which WHILE_ULT is supported. */ opt_scalar_int_mode cmp_mode_iter; tree cmp_type = NULL_TREE; + tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)); + tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo); + unsigned HOST_WIDE_INT max_vf = vect_max_vf (loop_vinfo); + widest_int iv_limit; + bool known_max_iters = max_loop_iterations (loop, &iv_limit); + if (known_max_iters) + { + if (niters_skip) + { + /* Add the maximum number of skipped iterations to the + maximum iteration count. */ + if (TREE_CODE (niters_skip) == INTEGER_CST) + iv_limit += wi::to_widest (niters_skip); + else + iv_limit += max_vf - 1; + } + /* IV_LIMIT is the maximum number of latch iterations, which is also + the maximum in-range IV value. Round this value down to the previous + vector alignment boundary and then add an extra full iteration. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + iv_limit = (iv_limit & -(int) known_alignment (vf)) + max_vf; + } + + /* Get the vectorization factor in tree form. */ FOR_EACH_MODE_IN_CLASS (cmp_mode_iter, MODE_INT) { unsigned int cmp_bits = GET_MODE_BITSIZE (cmp_mode_iter.require ()); @@ -1045,12 +1069,23 @@ vect_verify_full_masking (loop_vec_info loop_vinfo) if (this_type && can_produce_all_loop_masks_p (loop_vinfo, this_type)) { + /* See whether zero-based IV would ever generate all-false masks + before wrapping around. */ + bool might_wrap_p + = (!known_max_iters + || (wi::min_precision + (iv_limit + * vect_get_max_nscalars_per_iter (loop_vinfo), + UNSIGNED) > cmp_bits)); /* Although we could stop as soon as we find a valid mode, it's often better to continue until we hit Pmode, since the operands to the WHILE are more likely to be reusable in - address calculations. */ + address calculations. Unless the limit is extended from + this_type. */ cmp_type = this_type; - if (cmp_bits >= GET_MODE_BITSIZE (Pmode)) + if (cmp_bits >= GET_MODE_BITSIZE (Pmode) + || (!might_wrap_p + && (cmp_bits == TYPE_PRECISION (niters_type)))) break; } } -- 2.7.4