From patchwork Mon Aug  9 08:12:51 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 1514958
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
 unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com
 header.a=rsa-sha256 header.s=default header.b=mReZ+zpn;
	dkim-atps=neutral
Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4Gjpk157p8z9sT6
	for <incoming@patchwork.ozlabs.org>; Mon,  9 Aug 2021 18:13:12 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 606FB3892036
	for <incoming@patchwork.ozlabs.org>; Mon,  9 Aug 2021 08:13:09 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 205803858400
 for <gcc-patches@gcc.gnu.org>; Mon,  9 Aug 2021 08:12:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 205803858400
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=2gEsnn7lvjKY56eJt9QMC1Bh+Q7HoaM2fpYthjInPl4=; b=mReZ+zpn8jVDHV0sn1+XKJOM2M
 1d5oSPGlExY4XXPLQkxjrzJl0X0PRZLg86cNM/HpjYRijL+nLvLx7gVcd2X5fk2SKG4gOfBq7NKUG
 wF1chPCj3nXjJOYMFvF4G4PYjRpiAK1fNSzlIgQKYEgnvDLVvg/RhE/QUd6OI9I0VP1Ib+0+oDif/
 v1siGl+rjpcUtuO90OczbH+P6UZv9wIqOndlpj2YAM9C6ICN1pKnj187nVtTvriIfgw0qPtH+TlWa
 77C8PV14mwhKLeiXEkhosQOTweVNtmTIouW6kYlLxZrIPqgbSPNlzdq949mm4+9rzwB32ifU5YDe7
 Ns6K7cVg==;
Received: from host109-154-46-127.range109-154.btcentralplus.com
 ([109.154.46.127]:59742 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>) id 1mD0P0-0000lW-EQ
 for gcc-patches@gcc.gnu.org; Mon, 09 Aug 2021 04:12:54 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] Improved handling of MULT_EXPR in bit CCP.
Date: Mon, 9 Aug 2021 09:12:51 +0100
Message-ID: <00d501d78cf6$59e62f20$0db28d60$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdeM9YZFj5vdRgJWQ0qwNze/9ZQcYA==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

This patch allows GCC to constant fold (i | (i<<16)) | ((i<<24) | (i<<8)),
where i is an unsigned char, or the equivalent (i*65537) | (i*16777472), to
i*16843009.  The trick is to teach tree_nonzero_bits which bits may be
set in the result of a multiplication by a constant given which bits are
potentially set in the operands.  This allows the optimizations recently
added to match.pd to catch more cases.

The required mask/value pair from a multiplication may be calculated using
a classical shift-and-add algorithm, given we already have implementations
for both addition and shift by constant.  To keep this optimization "cheap",
this functionality is only used if the constant multiplier has a few bits
set (unless flag_expensive_optimizations), and we provide a special case
fast-path implementation for the common case where the (non-constant)
operand has no bits that are guaranteed to be set.  I have no evidence
that this functionality causes performance issues, it's just that sparse
multipliers provide the largest benefit to CCP.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-08-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-ccp.c (bit_value_mult_const): New helper function to
	calculate the mask-value pair result of a multiplication by an
	unsigned constant.
	(bit_value_binop) [MULT_EXPR]:  Call it from here for
multiplications
	by non-negative constants.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-ior-5.c: New test case.

Roger
---

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

unsigned int test_ior(unsigned char i)
{
  return (i | (i<<16)) | ((i<<24) | (i<<8));
}

unsigned int test_xor(unsigned char i)
{
  return (i ^ (i<<16)) ^ ((i<<24) ^ (i<<8));
}

/* { dg-final { scan-tree-dump-not " \\^ " "optimized" } } */
/* { dg-final { scan-tree-dump-not " \\| " "optimized" } } */
/* { dg-final { scan-tree-dump-times " \\* 16843009" 2 "optimized" } } */
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 9ce6214..86ca3ae 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1340,6 +1340,66 @@ bit_value_unop (enum tree_code code, signop type_sgn, int type_precision,
     }
 }
 
+/* Determine the mask pair *VAL and *MASK from multiplying the
+   argument mask pair RVAL, RMASK by the unsigned constant C.  */
+void
+bit_value_mult_const (signop sgn, int width,
+		      widest_int *val, widest_int *mask,
+		      const widest_int &rval, const widest_int &rmask,
+		      widest_int c)
+{
+  widest_int sum_mask = 0;
+
+  /* Ensure rval_lo only contains known bits.  */
+  widest_int rval_lo = wi::bit_and_not (rval, rmask);
+
+  if (rval_lo != 0)
+    {
+      /* General case (some bits of multiplicand are known set).  */
+      widest_int sum_val = 0;
+      while (c != 0)
+	{
+	  /* Determine the lowest bit set in the multiplier.  */
+	  int bitpos = wi::ctz (c);
+	  widest_int term_mask = rmask << bitpos;
+	  widest_int term_val = rval_lo << bitpos;
+
+	  /* sum += term.  */
+	  widest_int lo = sum_val + term_val;
+	  widest_int hi = (sum_val | sum_mask) + (term_val | term_mask);
+	  sum_mask |= term_mask | (lo ^ hi);
+	  sum_val = lo;
+
+	  /* Clear this bit in the multiplier.  */
+	  c ^= wi::lshift (1, bitpos);
+	}
+      /* Correctly extend the result value.  */
+      *val = wi::ext (sum_val, width, sgn);
+    }
+  else
+    {
+      /* Special case (no bits of multiplicand are known set).  */
+      while (c != 0)
+	{
+	  /* Determine the lowest bit set in the multiplier.  */
+	  int bitpos = wi::ctz (c);
+	  widest_int term_mask = rmask << bitpos;
+
+	  /* sum += term.  */
+	  widest_int hi = sum_mask + term_mask;
+	  sum_mask |= term_mask | hi;
+
+	  /* Clear this bit in the multiplier.  */
+	  c ^= wi::lshift (1, bitpos);
+	}
+      *val = 0;
+    }
+
+  /* Correctly extend the result mask.  */
+  *mask = wi::ext (sum_mask, width, sgn);
+}
+
+
 /* Apply the operation CODE in type TYPE to the value, mask pairs
    R1VAL, R1MASK and R2VAL, R2MASK representing a values of type R1TYPE
    and R2TYPE and set the value, mask pair *VAL and *MASK to the result.  */
@@ -1482,24 +1542,33 @@ bit_value_binop (enum tree_code code, signop sgn, int width,
       }
 
     case MULT_EXPR:
-      {
-	/* Just track trailing zeros in both operands and transfer
-	   them to the other.  */
-	int r1tz = wi::ctz (r1val | r1mask);
-	int r2tz = wi::ctz (r2val | r2mask);
-	if (r1tz + r2tz >= width)
-	  {
-	    *mask = 0;
-	    *val = 0;
-	  }
-	else if (r1tz + r2tz > 0)
-	  {
-	    *mask = wi::ext (wi::mask <widest_int> (r1tz + r2tz, true),
-			     width, sgn);
-	    *val = 0;
-	  }
-	break;
-      }
+      if (r2mask == 0
+	  && !wi::neg_p (r2val, sgn)
+	  && (flag_expensive_optimizations || wi::popcount (r2val) < 8))
+	bit_value_mult_const (sgn, width, val, mask, r1val, r1mask, r2val);
+      else if (r1mask == 0
+	       && !wi::neg_p (r1val, sgn)
+	       && (flag_expensive_optimizations || wi::popcount (r1val) < 8))
+	bit_value_mult_const (sgn, width, val, mask, r2val, r2mask, r1val);
+      else
+	{
+	  /* Just track trailing zeros in both operands and transfer
+	     them to the other.  */
+	  int r1tz = wi::ctz (r1val | r1mask);
+	  int r2tz = wi::ctz (r2val | r2mask);
+	  if (r1tz + r2tz >= width)
+	    {
+	      *mask = 0;
+	      *val = 0;
+	    }
+	  else if (r1tz + r2tz > 0)
+	    {
+	      *mask = wi::ext (wi::mask <widest_int> (r1tz + r2tz, true),
+			       width, sgn);
+	      *val = 0;
+	    }
+	}
+      break;
 
     case EQ_EXPR:
     case NE_EXPR: