From patchwork Thu Sep 18 09:40:44 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Lawrence <alan.lawrence@arm.com>
X-Patchwork-Id: 390715
Return-Path: 
 <gcc-patches-return-377955-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 060EC1401EF
	for <incoming@patchwork.ozlabs.org>;
	Thu, 18 Sep 2014 19:40:59 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:references
	:in-reply-to:content-type; q=dns; s=default; b=HconlhN/jy/kFwb1Q
	nYI8/DGEKzRQm97a4Vlhc8NgkIgZXEon9avEm2AZSDE1TzGSRlrS3jFAkRm8mLe7
	wqywk3VyMEBFEHBMoSl93cr66DgV7MMQ9w+dUH9pBknqJLF7gI2BrNoNrmDaKXck
	GOXwcV/++NuYLdmmKiqdJJkT3Q=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:references
	:in-reply-to:content-type; s=default; bh=CLJt4lIwSezJezUwIdX/6cd
	DqS8=; b=pVs/0xLMidUlhVIt18tnFrQzAMZ1E1zAUayVCKFVgBC9MBmrus2o3NZ
	L6PafmLfB4GHLsKwaN/Mgot+NNnKglY6kqOnfyj5PX884MTdoFh101sswI9icL1g
	EsXRYYsr1F8kTMSUNfaKcA+2SV1vhOsPGqJHfvlcjcYMVqn/7dJM=
Received: (qmail 6223 invoked by alias); 18 Sep 2014 09:40:51 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 6213 invoked by uid 89); 18 Sep 2014 09:40:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Thu, 18 Sep 2014 09:40:48 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Thu, 18 Sep 2014 10:40:45 +0100
Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Thu, 18 Sep 2014 10:40:45 +0100
Message-ID: <541AA89C.9070005@arm.com>
Date: Thu, 18 Sep 2014 10:40:44 +0100
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: Jeff Law <law@redhat.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in
	combine.c
References: <53B1B4FE.7010201@arm.com> <53B1D271.5000405@redhat.com>
	<53C69926.4050503@arm.com> <53C80023.6000100@arm.com>
	<5409FBB1.3040509@redhat.com>
In-Reply-To: <5409FBB1.3040509@redhat.com>
X-MC-Unique: 114091810404511701
X-IsSubscribed: yes

Thanks for the reply - and the in-depth investigation. I agree that the
correctness of the compiler is critical rather than particular platforms such as
Ada / Alpha.

Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. "Just" putting that check in, achieves what I'm
trying for here, so I'd be happy to go with the attached patch and call it a
day. However, I'm a little concerned about the other cases - i.e. where
shift_mode is wider than result_mode.

If I understand correctly (and I'm not sure about that, let's see how far I
get), this means we'll perform the shift in (say) DImode, when we're only really
concerned about the lower (say) 32-bits (for an originally-SImode shift).
try_widen_shift_mode will in this case check that the result of the operation
*inside* the shift (in our case an XOR) has 33 sign bit copies (via
num_sign_bit_copies), i.e. that its *top* 32-bits are all equal to the original
SImode sign bit. <count> of these bits may be fed into the top of the desired
SImode result by the DImode shift. Right so far?

AFAICT, num_sign_bit_copies for an XOR, conservatively returns the minimum of
the num_sign_bit_copies of its two operands. I'm not sure whether this is
behaviour we should rely on in its callers, or for the sake of abstraction we
should treat num_sign_bit_copies as a black box (which does what it says on the,
erm, tin).

If the former, doesn't having num_sign_bit_copies >= the difference in size
between result_mode and shift_mode, of both operands to the XOR, guarantee
safety of the commutation (whether the constant is positive or negative)? We
could perform the shift (in the larger mode) on both of the XOR operands safely,
then XOR together their lower parts.

If, however, we want to play safe and ensure that we deal safely with any XOR
whose top (mode size difference + 1) bits were the same, then I think the
restriction that the XOR constant is positive is neither necessary nor
sufficient; rather (mirroring try_widen_shift_mode) I think we need that
num_sign_bit_copies of the constant in shift_mode, is more than the size
difference between result_mode and shift_mode.

Hmmm. I might try that patch at some point, I think it is the right check to
make. (Meta-comment: this would be *so*much* easier if we could write unit tests
more easily!) In the meantime I'd be happy to settle for the attached...

(tests are as they were posted
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .)

--Alan

Jeff Law wrote:
> On 07/17/14 10:56, Alan Lawrence wrote:
>> Ok, the attached tests are passing on x86_64-none-linux-gnu,
>> aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for
>> which I've only built a stage 1 compiler (i.e. as far as necessary to
>> assemble). That's with either change to simplify_shift_const_1.
>>
>> As to the addition of "result_mode != shift_mode", or removing the whole
>> check against XOR - I've now tested the latter:
>>
>> bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada;
>> bootstrapped on arm-none-linux-gnueabihf;
>> bootstrapped on aarch64-none-linux-gnu;
>> cross-tested check-gcc on aarch64-none-elf;
>> cross-tested on arm-none-eabi;
>> (and Uros has bootstrapped on alpha and done a suite of tests, as per
>> https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html).
>>
>>  From a perspective of paranoia, I'd lean towards adding "result_mode !=
>> shift_mode", but for neatness removing the whole check against XOR is
>> nicer. So I'd defer to the maintainers as to whether one might be
>> preferable to the other...(but my unproven suspicion is that the two are
>> equivalent, and no case where result_mode != shift_mode is possible!)
> So first, whether or not someone cares about Alpha-VMS isn't the issue 
> at hand.  It's whether or not the new code is correct or not. 
> Similarly the fact that the code generation paths are radically 
> different now when compared to 2004 and we can't currently trigger the 
> cases where the modes are different isn't the issue, again, it's whether 
> or not your code is correct or not.
> 
> I think the key is to look at try_widen_shift_mode and realize that it 
> can return a mode larger than the original mode of the operations. 
> However, it only does so when it presented with a case where it knows 
> the sign bit being shifted in from the left will be the same as the sign 
> bit in the original mode.
> 
> In the case of an XOR when the sign bit set in shift_mode, that's not 
> going to be the case.  We would violate the assumption made when we 
> decided to widen the shift to shift_mode.
> 
> So your relaxation is safe when shift_mode == result_mode, but unsafe 
> otherwise -- even though we don't currently have a testcase for the 
> shift_mode != result_mode case, we don't want to break that.
> 
> So your updated patch is correct.  However, I would ask that you make 
> one additional change.  Namely the comment before the two fragments of 
> code you changed needs updating.  Something like
> 
> "... and the constant has its sign bit set in shift_mode and shift_mode
>   is different than result_mode".
> 
> The 2nd comment should have similar clarifying comments.
> 
> You should also include your testcase for a cursory review.
> 
> So with the inclusion of the testcase and updated comments, we should be 
> good to go.  However, please post the final patch for that cursory 
> review of the testcase.
> 
> jeff
> 
> 
>

diff --git a/gcc/combine.c b/gcc/combine.c
index 0ec7f852c47dc2e90d70210b4b7bc8350806d41d..3517479cf879219bc1012b7379b1f495c939c2f4 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10257,8 +10257,10 @@ simplify_shift_const_1 (enum rtx_code code, enum machine_mode result_mode,
 
 	  if (CONST_INT_P (XEXP (varop, 1))
 	      /* We can't do this if we have (ashiftrt (xor))  and the
-		 constant has its sign bit set in shift_mode.  */
+		 constant has its sign bit set in shift_mode with shift_mode
+		 wider than result_mode.  */
 	      && !(code == ASHIFTRT && GET_CODE (varop) == XOR
+		   && result_mode != shift_mode
 		   && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
 					      shift_mode))
 	      && (new_rtx = simplify_const_binary_operation
@@ -10275,10 +10277,12 @@ simplify_shift_const_1 (enum rtx_code code, enum machine_mode result_mode,
 
 	  /* If we can't do that, try to simplify the shift in each arm of the
 	     logical expression, make a new logical expression, and apply
-	     the inverse distributive law.  This also can't be done
-	     for some (ashiftrt (xor)).  */
+	     the inverse distributive law.  This also can't be done for
+	     (ashiftrt (xor)) where we've widened the shift and the constant
+	     changes the sign bit.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
 	     && !(code == ASHIFTRT && GET_CODE (varop) == XOR
+		  && result_mode != shift_mode
 		  && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
 					     shift_mode)))
 	    {
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..90e64fd10dc358f10ad03a90041605bc3ccb7011
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile {target sparc64*-*-* aarch64*-*-* x86_64-*-* powerpc64*-*-*} } */
+/* { dg-options "-O2 -fdump-rtl-combine-all" } */
+
+typedef long long int int64_t;
+
+int64_t
+foo (int64_t a)
+{
+  return (~a) >> 63;
+}
+
+/* The combine phase will try to combine not & ashiftrt, and
+   combine_simplify_rtx should transform (ashiftrt (not x) 63)
+   to (not (ashiftrt x 63)) and then to (neg (ge x 0)). We look for
+   the *attempt* to match this RTL pattern, regardless of whether an
+   actual insn may be found on the platform.  */
+/* { dg-final { scan-rtl-dump "\\(neg:DI \\(ge:DI" "combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..fd6827caed230ea5dd2d6ec4431b11bf826531ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile {target arm*-*-* i?86-*-* powerpc-*-* sparc-*-*} } */
+/* { dg-options "-O2 -fdump-rtl-combine-all" } */
+
+typedef long int32_t;
+
+int32_t
+foo (int32_t a)
+{
+  return (~a) >> 31;
+}
+
+/* The combine phase will try to combine not & ashiftrt, and
+   combine_simplify_rtx should transform (ashiftrt (not x) 31)
+   to (not (ashiftrt x 63)) and then to (neg (ge x 0)). We look for
+   the *attempt* to match this RTL pattern, regardless of whether an
+   actual insn may be found on the platform.  */
+/* { dg-final { scan-rtl-dump "\\(neg:SI \\(ge:SI" "combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
index cb0f4a04c0fb5f2c93064c47a141556f7fd0f89a..f2c55922f18c0e6840c403f419fe4a98a15e5f97 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/int_comparisons_1.c
@@ -30,18 +30,16 @@
 /* Comparisons against immediate zero, on the 8 signed integer types only.  */
 
 /* { dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
-/*  For int64_t and int64x1_t, combine_simplify_rtx failure of
-    https://gcc.gnu.org/ml/gcc/2014-06/msg00253.html
-    prevents generation of cmge....#0, instead producing mvn + sshr.  */
-/* { #dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]cmge\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
 /* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
 /* { dg-final { scan-assembler-times "\[ \t\]cmgt\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
 /* { dg-final { scan-assembler-times "\[ \t\]cmle\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
 /* { dg-final { scan-assembler-times "\[ \t\]cmle\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?0" 2 } } */
 /* { dg-final { scan-assembler-times "\[ \t\]cmlt\[ \t\]+v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*v\[0-9\]+\.\[0-9\]+\[bshd\],\[ \t\]*#?0" 7 } } */
 /* For int64_t and int64x1_t, cmlt ... #0 and sshr ... #63 are equivalent,
-   so allow either.  cmgez issue above results in extra 2 * sshr....63.  */
-/* { dg-final { scan-assembler-times "\[ \t\](?:cmlt|sshr)\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?(?:0|63)" 4 } } */
+   so allow either.  */
+/* { dg-final { scan-assembler-times "\[ \t\](?:cmlt|sshr)\[ \t\]+d\[0-9\]+,\[ \t\]*d\[0-9\]+,\[ \t\]*#?(?:0|63)" 2 } } */
 
 // All should have been compiled into single insns without inverting result:
 /* { dg-final { scan-assembler-not "\[ \t\]not\[ \t\]" } } */
+/* { dg-final { scan-assembler-not "\[ \t\]mvn\[ \t\]" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/singleton_intrinsics_1.c b/gcc/testsuite/gcc.target/aarch64/singleton_intrinsics_1.c
index 8a8272ba48eedc12b56357bf4073dda791fb10f9..4a0934b01f9442b7f1324a1f4528d45022daf9b8 100644
--- a/gcc/testsuite/gcc.target/aarch64/singleton_intrinsics_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/singleton_intrinsics_1.c
@@ -57,8 +57,7 @@ test_vcle_s64 (int64x1_t a, int64x1_t b)
   return vcle_s64 (a, b);
 }
 
-/* Idiom recognition will cause this testcase not to generate
-   the expected cmge instruction, so do not check for it.  */
+/* { dg-final { scan-assembler-times "\\tcmge\\td\[0-9\]+, d\[0-9\]+, #?0" 1 } } */
 
 uint64x1_t
 test_vcgez_s64 (int64x1_t a)
@@ -236,8 +235,8 @@ test_vrshl_u64 (uint64x1_t a, int64x1_t b)
   return vrshl_u64 (a, b);
 }
 
-/* { dg-final { scan-assembler-times "\\tsshr\\td\[0-9\]+" 3 } } */
-/* Idiom recognition compiles vcltz and vcgez to sshr rather than cmlt/cmge.  */
+/* For int64x1_t, sshr...#63 is output instead of the equivalent cmlt...#0.  */
+/* { dg-final { scan-assembler-times "\\tsshr\\td\[0-9\]+" 2 } } */
 
 int64x1_t
 test_vshr_n_s64 (int64x1_t a)