From patchwork Thu Apr 21 16:09:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ilya Enkovich <enkovich.gnu@gmail.com>
X-Patchwork-Id: 613181
Return-Path: 
 <gcc-patches-return-425350-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3qrNyt5lCMz9sDk
	for <incoming@patchwork.ozlabs.org>;
	Fri, 22 Apr 2016 02:11:02 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=fhusedxv; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; q=dns; s=
	default; b=Rjt7ZWm35DCtQfB2axLTRPXOIu2By9XkvD2bPxWoaVlCeiq+7Wxww
	Mk4zTdKRzv6HpZ9c2TKAOVHpnQUWy+1fWfSsC4mOBkfLcqkiygUPAtx9kk5j+xrt
	DYkhM+ssORTpdnq9/7lZFodsO8s7CfuX982fMLSa/GGPyEFaYpBss8=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; s=
	default; bh=bn3FJnXik8PT73zdVmLvBlkieN0=; b=fhusedxvFXgamlZeKCmo
	yN4NNrj24MycYjPYKTj+KAHuiJusa+wxZa6tVBPuCORkFwLLUflH9jOQImi/1pa/
	z7/KhDjig0l51LlJjpk8QPcuAHoZASddSdNhtSMFkYgec9aAy5vHo2RWQgwyUHfp
	MudwzK2K8te6C7lyT7u6Axg=
Received: (qmail 70302 invoked by alias); 21 Apr 2016 16:10:51 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 70291 invoked by uid 89); 21 Apr 2016 16:10:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2 spammy=ffastmath,
	ffast-math
X-HELO: mail-qk0-f194.google.com
Received: from mail-qk0-f194.google.com (HELO mail-qk0-f194.google.com)
	(209.85.220.194) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
	encrypted) ESMTPS; Thu, 21 Apr 2016 16:10:40 +0000
Received: by mail-qk0-f194.google.com with SMTP id q184so4170598qkf.0 for
	<gcc-patches@gcc.gnu.org>; Thu, 21 Apr 2016 09:10:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:date:from:to:subject:message-id:mime-version
	:content-disposition:user-agent;
	bh=zklxk92EENJHJrK6jyHQJmYRQhZ74jEu1qTYa7ruUE4=;
	b=R8Z9vOx2RSK4OWiT/d7N63pOA9kzRbd2U2vr8Pagyn7c0R6u1JQ4V90lR2FyXVw9yL
	XXWKeCrxb/5dxKs494nP+ukdijeouydxeEhy2Cn0lNwo8o7cw6WrHUesZbgD9pXconGf
	NIQGJs9da2i+F6kl6B1Pe5iCkU/YmFYuK3kpYCyUpeTNyQQlzPYR6U+p/QCzI8BW2tE4
	W8yBoP3Il34nC+QS9X2HO3q33t3f4rT0hbipPI6VikQeZw2vsjz3KfLnNdPDI2EqULIe
	VxkDRtFUloCdTpcaQLXZVX0XrlSQ8vz2mqydEjJJL0uwCr0WcTNGLL0WZLdcoLuBHFbf
	0QfA==
X-Gm-Message-State: 
 AOPr4FVikUQtFdtGCOWGkdq92FfDsHcZphb3bIACMKHmlvJCVnlNGeoi5oAlM20Wt4o5Ng==
X-Received: by 10.55.21.34 with SMTP id f34mr6062640qkh.101.1461255038477;
	Thu, 21 Apr 2016 09:10:38 -0700 (PDT)
Received: from msticlxl57.ims.intel.com (irdmzpr02-ext.ir.intel.com.
	[192.198.151.37]) by smtp.gmail.com with ESMTPSA id
	b6sm564899qkh.12.2016.04.21.09.10.36 for
	<gcc-patches@gcc.gnu.org> (version=TLS1_2
	cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 21 Apr 2016 09:10:37 -0700 (PDT)
Date: Thu, 21 Apr 2016 19:09:37 +0300
From: Ilya Enkovich <enkovich.gnu@gmail.com>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized
	loops
Message-ID: <20160421160937.GB7047@msticlxl57.ims.intel.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
X-IsSubscribed: yes

Hi,

Currently when loop is vectorized we adjust its nb_iterations_upper_bound
by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
is upper bound for (<number of loop iterations> - 1) and therefore simple
dividing it by VF in many cases gives us bounds greater than a real one.
Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).

Also decrement due to peeling for gaps should happen before we scale it
by VF because peeling applies to a scalar loop, not vectorized one.

This patch modifies nb_iterations_upper_bound computation to resolve
these issues.

Running regression testing I got one fail due to optimized loop. Heres
is a loop:

foo (signed char s)
{
  signed char i;
  for (i = 0; i < s; i++)
    yy[i] = (signed int) i;
}

Here we vectorize for AVX512 using VF=64.  Original loop has max 127
iterations and therefore vectorized loop may be executed only once.
With the patch applied compiler detects it and transforms loop into
BB with just stores of constants vectors into yy.  Test was adjusted
to increase number of possible iterations.  A copy of test was added
to check we can optimize out the original loop.

Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?

Thanks,
Ilya
---
gcc/

2016-04-21  Ilya Enkovich  <ilya.enkovich@intel.com>

	* tree-vect-loop.c (vect_transform_loop): Fix
	nb_iterations_upper_bound computation for vectorized loop.

gcc/testsuite/

2016-04-21  Ilya Enkovich  <ilya.enkovich@intel.com>

	* gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
	optimization of vector loop.
	* gcc.target/i386/vect-unpack-3.c: New test.

diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
index 4825248..51c518e 100644
--- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
@@ -6,19 +6,22 @@
 
 #define N 120
 signed int yy[10000];
+signed char zz[10000];
 
 void
-__attribute__ ((noinline)) foo (signed char s)
+__attribute__ ((noinline,noclone)) foo (int s)
 {
-   signed char i;
+   int i;
    for (i = 0; i < s; i++)
-     yy[i] = (signed int) i;
+     yy[i] = zz[i];
 }
 
 void
 avx512bw_test ()
 {
   signed char i;
+  for (i = 0; i < N; i++)
+    zz[i] = i;
   foo (N);
   for (i = 0; i < N; i++)
     if ( (signed int)i != yy [i] )
diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
new file mode 100644
index 0000000..eb8a93e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math -mavx512bw -save-temps" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+#define N 120
+signed int yy[10000];
+
+void
+__attribute__ ((noinline)) foo (signed char s)
+{
+   signed char i;
+   for (i = 0; i < s; i++)
+     yy[i] = (signed int) i;
+}
+
+void
+avx512bw_test ()
+{
+  signed char i;
+  foo (N);
+  for (i = 0; i < N; i++)
+    if ( (signed int)i != yy [i] )
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d813b86..da98211 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
 		      expected_iterations / vectorization_factor);
-  loop->nb_iterations_upper_bound
-    = wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor);
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
       && loop->nb_iterations_upper_bound != 0)
     loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
+  loop->nb_iterations_upper_bound
+    = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
+		      vectorization_factor) - 1;
+
   if (loop->any_estimate)
     {
       loop->nb_iterations_estimate