From patchwork Tue Jul 6 19:01:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1501459 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=mSHNMoT8; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GKBkH1ZqMz9sV8 for ; Wed, 7 Jul 2021 05:01:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EB33A3893C78 for ; Tue, 6 Jul 2021 19:01:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 1E8A53864858 for ; Tue, 6 Jul 2021 19:01:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E8A53864858 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=sod5bw2N9iM+c9ubQHD6AN0xX56KIPY1237F7UEwvkA=; b=mSHNMoT8TmdYai9zDOpYk5KqOY eMx3gk3ca+VCEE2XUINQG5f4D3Sw0T5WV0jtO15JQmNeGFi4q/tDwIOXlLvj7AP1Uprg2TiqnerDo hp9gBVZgyG+JqZMxXX/uBqvlze77E5S54Jy3hXNa9mS6tQSj2Jo3TNYgYff/8Dz/FC1hAwTzYn3Nk 04aDnKcAVUsam5kpghYt+w8ZK5udv+xAhyZJquCQ3YkqOAa0Bb/S7EKebMQE0F3DVIOKmaLtOLFJf O5GYObMzkxkBZgHhjrj58eKgq3VjNt//wHZkdemgfAb3F6RoAgedDywqlr5EyoKYvCAet4XbFjZNw ft2UFMpw==; Received: from host86-169-60-32.range86-169.btcentralplus.com ([86.169.60.32]:49312 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1m0qJz-0001XJ-Ev for gcc-patches@gcc.gnu.org; Tue, 06 Jul 2021 15:01:27 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 in match.pd Date: Tue, 6 Jul 2021 20:01:24 +0100 Message-ID: <001b01d77299$522a36e0$f67ea4a0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Addyl3yvY+xv3My4TcSz8cIhRJAGuw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" All of the optimizations/transformations mentioned in bugzilla for PR tree-optimization/40210 are already implemented in mainline GCC, with one exception. In comment #5, there's a suggestion that (bswap64(x)>>56)&0xff can be implemented without the bswap as (unsigned char)x, or equivalently x&0xff. This patch implements the above optimization, and closely related variants. For any single bit, (bswap(X)>>C1)&1 can be simplified to (X>>C2)&1, where bit position C2 is the appropriate permutation of C1. Similarly, the bswap can eliminated if the desired set of bits all lie within the same byte, hence (bswap(x)>>8)&255 can always be optimized, as can (bswap(x)>>8)&123. Previously, int foo(long long x) { return (__builtin_bswap64(x) >> 56) & 0xff; } compiled with -O2 to foo: movq %rdi, %rax bswap %rax shrq $56, %rax ret with this patch, it now compiles to foo: movzbl %dil, %eax ret This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-07-06 Roger Sayle gcc/ChangeLog PR tree-optimization/40210 * builtins.c (get_builtin_precision): Helper function to determine the precision in bits of a built-in function. * builtins.h (get_builtin_precision): Prototype here. * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. gcc/testsuite/ChangeLog PR tree-optimization/40210 * gcc.dg/builtin-bswap-13.c: New test. * gcc.dg/builtin-bswap-14.c: New test. Roger --- Roger Sayle NextMove Software Cambridge, UK /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ int test_s32_0_1(int x) { return __builtin_bswap32(x) & 1; } int test_s32_0_2(int x) { return __builtin_bswap32(x) & 2; } int test_s32_0_240(int x) { return __builtin_bswap32(x) & 240; } int test_s32_0_255(int x) { return __builtin_bswap32(x) & 255; } int test_s32_1_1(int x) { return (__builtin_bswap32(x) >> 1) & 1; } int test_s32_7_1(int x) { return (__builtin_bswap32(x) >> 7) & 1; } int test_s32_8_1(int x) { return (__builtin_bswap32(x) >> 8) & 1; } int test_s32_8_240(int x) { return (__builtin_bswap32(x) >> 8) & 240; } int test_s32_8_255(int x) { return (__builtin_bswap32(x) >> 8) & 255; } int test_s32_15_1(int x) { return (__builtin_bswap32(x) >> 15) & 1; } int test_s32_16_1(int x) { return (__builtin_bswap32(x) >> 16) & 1; } int test_s32_16_240(int x) { return (__builtin_bswap32(x) >> 16) & 240; } int test_s32_16_255(int x) { return (__builtin_bswap32(x) >> 16) & 255; } int test_s32_24_1(int x) { return (__builtin_bswap32(x) >> 24) & 1; } int test_s32_24_240(int x) { return (__builtin_bswap32(x) >> 24) & 240; } int test_s32_24_255(int x) { return (__builtin_bswap32(x) >> 24) & 255; } int test_s32_31_1(int x) { return (__builtin_bswap32(x) >> 31) & 1; } int test_S32_0_1(int x) { return (int)__builtin_bswap32(x) & 1; } int test_S32_0_2(int x) { return (int)__builtin_bswap32(x) & 2; } int test_S32_0_240(int x) { return (int)__builtin_bswap32(x) & 240; } int test_S32_0_255(int x) { return (int)__builtin_bswap32(x) & 255; } int test_S32_1_1(int x) { return ((int)__builtin_bswap32(x) >> 1) & 1; } int test_S32_7_1(int x) { return ((int)__builtin_bswap32(x) >> 7) & 1; } int test_S32_8_1(int x) { return ((int)__builtin_bswap32(x) >> 8) & 1; } int test_S32_8_240(int x) { return ((int)__builtin_bswap32(x) >> 8) & 240; } int test_S32_8_255(int x) { return ((int)__builtin_bswap32(x) >> 8) & 255; } int test_S32_15_1(int x) { return ((int)__builtin_bswap32(x) >> 15) & 1; } int test_S32_16_1(int x) { return ((int)__builtin_bswap32(x) >> 16) & 1; } int test_S32_16_240(int x) { return ((int)__builtin_bswap32(x) >> 16) & 240; } int test_S32_16_255(int x) { return ((int)__builtin_bswap32(x) >> 16) & 255; } int test_S32_24_1(int x) { return ((int)__builtin_bswap32(x) >> 24) & 1; } int test_S32_24_240(int x) { return ((int)__builtin_bswap32(x) >> 24) & 240; } int test_S32_24_255(int x) { return ((int)__builtin_bswap32(x) >> 24) & 255; } int test_S32_31_1(int x) { return ((int)__builtin_bswap32(x) >> 31) & 1; } unsigned int test_u32_24_255(unsigned int x) { return (__builtin_bswap32(x) >> 24) & 255; } long long test_s64_0_1(long long x) { return __builtin_bswap64(x) & 1; } long long test_s64_0_2(long long x) { return __builtin_bswap64(x) & 2; } long long test_s64_0_240(long long x) { return __builtin_bswap64(x) & 240; } long long test_s64_0_255(long long x) { return __builtin_bswap64(x) & 255; } long long test_s64_7_1(long long x) { return (__builtin_bswap64(x) >> 7) & 1; } long long test_s64_8_1(long long x) { return (__builtin_bswap64(x) >> 8) & 1; } long long test_s64_8_240(long long x) { return (__builtin_bswap64(x) >> 56) & 240; } long long test_s64_8_255(long long x) { return (__builtin_bswap64(x) >> 8) & 255; } long long test_s64_9_1(long long x) { return (__builtin_bswap64(x) >> 9) & 1; } long long test_s64_31_1(long long x) { return (__builtin_bswap64(x) >> 31) & 1; } long long test_s64_32_1(long long x) { return (__builtin_bswap64(x) >> 32) & 1; } long long test_s64_32_240(long long x) { return (__builtin_bswap64(x) >> 32) & 240; } long long test_s64_32_255(long long x) { return (__builtin_bswap64(x) >> 32) & 255; } long long test_s64_33_1(long long x) { return (__builtin_bswap64(x) >> 33) & 1; } long long test_s64_48_1(long long x) { return (__builtin_bswap64(x) >> 48) & 1; } long long test_s64_48_240(long long x) { return (__builtin_bswap64(x) >> 48) & 240; } long long test_s64_48_255(long long x) { return (__builtin_bswap64(x) >> 48) & 255; } long long test_s64_56_1(long long x) { return (__builtin_bswap64(x) >> 56) & 1; } long long test_s64_56_240(long long x) { return (__builtin_bswap64(x) >> 56) & 240; } long long test_s64_56_255(long long x) { return (__builtin_bswap64(x) >> 56) & 255; } long long test_s64_57_1(long long x) { return (__builtin_bswap64(x) >> 57) & 1; } long long test_s64_63_1(long long x) { return (__builtin_bswap64(x) >> 63) & 1; } long long test_S64_0_1(long long x) { return (long long)__builtin_bswap64(x) & 1; } long long test_S64_0_2(long long x) { return (long long)__builtin_bswap64(x) & 2; } long long test_S64_0_240(long long x) { return (long long)__builtin_bswap64(x) & 240; } long long test_S64_0_255(long long x) { return (long long)__builtin_bswap64(x) & 255; } long long test_S64_7_1(long long x) { return ((long long)__builtin_bswap64(x) >> 7) & 1; } long long test_S64_8_1(long long x) { return ((long long)__builtin_bswap64(x) >> 8) & 1; } long long test_S64_8_240(long long x) { return ((long long)__builtin_bswap64(x) >> 56) & 240; } long long test_S64_8_255(long long x) { return ((long long)__builtin_bswap64(x) >> 8) & 255; } long long test_S64_9_1(long long x) { return ((long long)__builtin_bswap64(x) >> 9) & 1; } long long test_S64_31_1(long long x) { return ((long long)__builtin_bswap64(x) >> 31) & 1; } long long test_S64_32_1(long long x) { return ((long long)__builtin_bswap64(x) >> 32) & 1; } long long test_S64_32_240(long long x) { return ((long long)__builtin_bswap64(x) >> 32) & 240; } long long test_S64_32_255(long long x) { return ((long long)__builtin_bswap64(x) >> 32) & 255; } long long test_S64_33_1(long long x) { return ((long long)__builtin_bswap64(x) >> 33) & 1; } long long test_S64_48_1(long long x) { return ((long long)__builtin_bswap64(x) >> 48) & 1; } long long test_S64_48_240(long long x) { return ((long long)__builtin_bswap64(x) >> 48) & 240; } long long test_S64_48_255(long long x) { return ((long long)__builtin_bswap64(x) >> 48) & 255; } long long test_S64_56_1(long long x) { return ((long long)__builtin_bswap64(x) >> 56) & 1; } long long test_S64_56_240(long long x) { return ((long long)__builtin_bswap64(x) >> 56) & 240; } long long test_S64_56_255(long long x) { return ((long long)__builtin_bswap64(x) >> 56) & 255; } long long test_S64_57_1(long long x) { return ((long long)__builtin_bswap64(x) >> 57) & 1; } long long test_S64_63_1(long long x) { return ((long long)__builtin_bswap64(x) >> 63) & 1; } unsigned long long test_u64_56_255(unsigned long long x) { return (__builtin_bswap64(x) >> 56) & 255; } short test_s16_0_1(short x) { return __builtin_bswap16(x) & 1; } short test_s16_0_240(short x) { return __builtin_bswap16(x) & 240; } short test_s16_0_255(short x) { return __builtin_bswap16(x) & 255; } short test_s16_1_1(short x) { return (__builtin_bswap16(x) >> 1) & 1; } short test_s16_7_1(short x) { return (__builtin_bswap16(x) >> 7) & 1; } short test_s16_8_1(short x) { return (__builtin_bswap16(x) >> 8) & 1; } short test_s16_8_240(short x) { return (__builtin_bswap16(x) >> 8) & 240; } short test_s16_8_255(short x) { return (__builtin_bswap16(x) >> 8) & 255; } short test_s16_9_1(short x) { return (__builtin_bswap16(x) >> 9) & 1; } short test_s16_15_1(short x) { return (__builtin_bswap16(x) >> 15) & 1; } short test_S16_0_1(short x) { return (short)__builtin_bswap16(x) & 1; } short test_S16_0_240(short x) { return (short)__builtin_bswap16(x) & 240; } short test_S16_0_255(short x) { return (short)__builtin_bswap16(x) & 255; } short test_S16_1_1(short x) { return ((short)__builtin_bswap16(x) >> 1) & 1; } short test_S16_7_1(short x) { return ((short)__builtin_bswap16(x) >> 7) & 1; } short test_S16_8_1(short x) { return ((short)__builtin_bswap16(x) >> 8) & 1; } short test_S16_8_240(short x) { return ((short)__builtin_bswap16(x) >> 8) & 240; } short test_S16_8_255(short x) { return ((short)__builtin_bswap16(x) >> 8) & 255; } short test_S16_9_1(short x) { return ((short)__builtin_bswap16(x) >> 9) & 1; } short test_S16_15_1(short x) { return ((short)__builtin_bswap16(x) >> 15) & 1; } unsigned short test_u16_8_255(unsigned short x) { return (__builtin_bswap16(x) >> 8) & 255; } /* Shifts only */ int test_s32_24(int x) { return __builtin_bswap32(x) >> 24; } int test_s32_25(int x) { return __builtin_bswap32(x) >> 25; } int test_s32_30(int x) { return __builtin_bswap32(x) >> 30; } int test_s32_31(int x) { return __builtin_bswap32(x) >> 31; } unsigned int test_u32_24(unsigned int x) { return __builtin_bswap32(x) >> 24; } unsigned int test_u32_25(unsigned int x) { return __builtin_bswap32(x) >> 25; } unsigned int test_u32_30(unsigned int x) { return __builtin_bswap32(x) >> 30; } unsigned int test_u32_31(unsigned int x) { return __builtin_bswap32(x) >> 31; } long long test_s64_56(long long x) { return __builtin_bswap64(x) >> 56; } long long test_s64_57(long long x) { return __builtin_bswap64(x) >> 57; } long long test_s64_62(long long x) { return __builtin_bswap64(x) >> 62; } long long test_s64_63(long long x) { return __builtin_bswap64(x) >> 63; } unsigned long long test_u64_56(unsigned long long x) { return __builtin_bswap64(x) >> 56; } unsigned long long test_u64_57(unsigned long long x) { return __builtin_bswap64(x) >> 57; } unsigned long long test_u64_62(unsigned long long x) { return __builtin_bswap64(x) >> 62; } unsigned long long test_u64_63(unsigned long long x) { return __builtin_bswap64(x) >> 63; } short test_s16_8(short x) { return __builtin_bswap16(x) >> 8; } short test_s16_9(short x) { return __builtin_bswap16(x) >> 9; } short test_s16_14(short x) { return __builtin_bswap16(x) >> 14; } short test_s16_15(short x) { return __builtin_bswap16(x) >> 15; } unsigned short test_u16_8(unsigned short x) { return __builtin_bswap16(x) >> 8; } unsigned short test_u16_9(unsigned short x) { return __builtin_bswap16(x) >> 9; } unsigned short test_u16_14(unsigned short x) { return __builtin_bswap16(x) >> 14; } unsigned short test_u16_15(unsigned short x) { return __builtin_bswap16(x) >> 15; } /* { dg-final { scan-tree-dump-not "__builtin_bswap" "optimized" } } */ /* { dg-do run } */ /* { dg-options "-O2" } */ extern void abort (void); __attribute__ ((noinline, noclone)) static int rt32 (int x, int y, int z) { return (__builtin_bswap32(x) >> y) & z; } #define TEST32(X,Y,Z) if(((__builtin_bswap32(X)>>Y)&Z)!=rt32(X,Y,Z)) abort() void test32(int x) { TEST32(x,0,1); TEST32(x,0,255); TEST32(x,1,1); TEST32(x,2,1); TEST32(x,3,1); TEST32(x,4,1); TEST32(x,5,1); TEST32(x,6,1); TEST32(x,7,1); TEST32(x,8,1); TEST32(x,8,255); TEST32(x,9,1); TEST32(x,10,1); TEST32(x,11,1); TEST32(x,12,1); TEST32(x,13,1); TEST32(x,14,1); TEST32(x,15,1); TEST32(x,16,1); TEST32(x,16,255); TEST32(x,17,1); TEST32(x,18,1); TEST32(x,19,1); TEST32(x,20,1); TEST32(x,21,1); TEST32(x,22,1); TEST32(x,23,1); TEST32(x,24,1); TEST32(x,24,255); TEST32(x,25,1); TEST32(x,26,1); TEST32(x,27,1); TEST32(x,28,1); TEST32(x,29,1); TEST32(x,30,1); TEST32(x,31,1); } #if __SIZEOF_LONG_LONG__ == 8 __attribute__ ((noinline, noclone)) static long long rt64 (long long x, int y, long long z) { return (__builtin_bswap64(x) >> y) & z; } #define TEST64(X,Y,Z) if(((__builtin_bswap64(X)>>Y)&Z)!=rt64(X,Y,Z)) abort() void test64(long long x) { TEST64(x,0,1); TEST64(x,0,255); TEST64(x,1,1); TEST64(x,2,1); TEST64(x,3,1); TEST64(x,4,1); TEST64(x,5,1); TEST64(x,6,1); TEST64(x,7,1); TEST64(x,8,1); TEST64(x,8,255); TEST64(x,9,1); TEST64(x,10,1); TEST64(x,11,1); TEST64(x,12,1); TEST64(x,13,1); TEST64(x,14,1); TEST64(x,15,1); TEST64(x,16,1); TEST64(x,16,255); TEST64(x,17,1); TEST64(x,18,1); TEST64(x,19,1); TEST64(x,20,1); TEST64(x,21,1); TEST64(x,22,1); TEST64(x,23,1); TEST64(x,24,1); TEST64(x,24,255); TEST64(x,25,1); TEST64(x,26,1); TEST64(x,27,1); TEST64(x,28,1); TEST64(x,29,1); TEST64(x,30,1); TEST64(x,31,1); TEST64(x,32,1); TEST64(x,32,255); TEST64(x,33,1); TEST64(x,34,1); TEST64(x,35,1); TEST64(x,36,1); TEST64(x,37,1); TEST64(x,38,1); TEST64(x,39,1); TEST64(x,40,1); TEST64(x,40,255); TEST64(x,41,1); TEST64(x,42,1); TEST64(x,43,1); TEST64(x,44,1); TEST64(x,45,1); TEST64(x,46,1); TEST64(x,47,1); TEST64(x,48,1); TEST64(x,48,255); TEST64(x,49,1); TEST64(x,50,1); TEST64(x,51,1); TEST64(x,52,1); TEST64(x,53,1); TEST64(x,54,1); TEST64(x,55,1); TEST64(x,56,1); TEST64(x,56,255); TEST64(x,57,1); TEST64(x,58,1); TEST64(x,59,1); TEST64(x,60,1); TEST64(x,61,1); TEST64(x,62,1); TEST64(x,63,1); } #endif __attribute__ ((noinline, noclone)) static int rt16 (int x, int y, int z) { return (__builtin_bswap16(x) >> y) & z; } #define TEST16(X,Y,Z) if(((__builtin_bswap16(X)>>Y)&Z)!=rt16(X,Y,Z)) abort() void test16(int x) { TEST16(x,0,1); TEST16(x,0,255); TEST16(x,1,1); TEST16(x,2,1); TEST16(x,3,1); TEST16(x,4,1); TEST16(x,5,1); TEST16(x,6,1); TEST16(x,7,1); TEST16(x,8,1); TEST16(x,8,255); TEST16(x,9,1); TEST16(x,10,1); TEST16(x,11,1); TEST16(x,12,1); TEST16(x,13,1); TEST16(x,14,1); TEST16(x,15,1); } int main() { test32(0x00000000); test32(0xffffffff); test32(0x00000001); test32(0x00000002); test32(0x00000004); test32(0x00000008); test32(0x00000010); test32(0x00000020); test32(0x00000040); test32(0x00000080); test32(0x00000100); test32(0x00000200); test32(0x00000400); test32(0x00000800); test32(0x00001000); test32(0x00002000); test32(0x00004000); test32(0x00008000); test32(0x00010000); test32(0x00020000); test32(0x00040000); test32(0x00080000); test32(0x00100000); test32(0x00200000); test32(0x00400000); test32(0x00800000); test32(0x01000000); test32(0x02000000); test32(0x04000000); test32(0x08000000); test32(0x10000000); test32(0x20000000); test32(0x40000000); test32(0x80000000); test32(0x12345678); test32(0x87654321); test32(0xdeadbeef); test32(0xcafebabe); #if __SIZEOF_LONG_LONG__ == 8 test64(0x0000000000000000ll); test64(0xffffffffffffffffll); test64(0x0000000000000001ll); test64(0x0000000000000002ll); test64(0x0000000000000004ll); test64(0x0000000000000008ll); test64(0x0000000000000010ll); test64(0x0000000000000020ll); test64(0x0000000000000040ll); test64(0x0000000000000080ll); test64(0x0000000000000100ll); test64(0x0000000000000200ll); test64(0x0000000000000400ll); test64(0x0000000000000800ll); test64(0x0000000000001000ll); test64(0x0000000000002000ll); test64(0x0000000000004000ll); test64(0x0000000000008000ll); test64(0x0000000000010000ll); test64(0x0000000000020000ll); test64(0x0000000000040000ll); test64(0x0000000000080000ll); test64(0x0000000000100000ll); test64(0x0000000000200000ll); test64(0x0000000000400000ll); test64(0x0000000000800000ll); test64(0x0000000001000000ll); test64(0x0000000002000000ll); test64(0x0000000004000000ll); test64(0x0000000008000000ll); test64(0x0000000010000000ll); test64(0x0000000020000000ll); test64(0x0000000040000000ll); test64(0x0000000080000000ll); test64(0x0000000100000000ll); test64(0x0000000200000000ll); test64(0x0000000400000000ll); test64(0x0000000800000000ll); test64(0x0000001000000000ll); test64(0x0000002000000000ll); test64(0x0000004000000000ll); test64(0x0000008000000000ll); test64(0x0000010000000000ll); test64(0x0000020000000000ll); test64(0x0000040000000000ll); test64(0x0000080000000000ll); test64(0x0000100000000000ll); test64(0x0000200000000000ll); test64(0x0000400000000000ll); test64(0x0000800000000000ll); test64(0x0001000000000000ll); test64(0x0002000000000000ll); test64(0x0004000000000000ll); test64(0x0008000000000000ll); test64(0x0010000000000000ll); test64(0x0020000000000000ll); test64(0x0040000000000000ll); test64(0x0080000000000000ll); test64(0x0100000000000000ll); test64(0x0200000000000000ll); test64(0x0400000000000000ll); test64(0x0800000000000000ll); test64(0x1000000000000000ll); test64(0x2000000000000000ll); test64(0x4000000000000000ll); test64(0x8000000000000000ll); test64(0x0123456789abcdefll); test64(0xfedcba9876543210ll); test64(0xdeadbeefdeadbeefll); test64(0xcafebabecafebabell); #endif test16(0x0000); test16(0xffff); test16(0x0001); test16(0x0002); test16(0x0004); test16(0x0008); test16(0x0010); test16(0x0020); test16(0x0040); test16(0x0080); test16(0x0100); test16(0x0200); test16(0x0400); test16(0x0800); test16(0x1000); test16(0x2000); test16(0x4000); test16(0x8000); test16(0x1234); test16(0x4321); test16(0xdead); test16(0xbeef); test16(0xcafe); test16(0xbabe); return 0; } diff --git a/gcc/builtins.c b/gcc/builtins.c index 855ad1e..86dfeae 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -14831,3 +14831,42 @@ builtin_fnspec (tree callee) return ""; } } + +/* Return the precision (in bits) of a built-in function. + Returns zero for unrecognized built-in functions. */ + +unsigned int +get_builtin_precision (enum built_in_function fn) +{ + switch (fn) + { + case BUILT_IN_BSWAP16: + return 16; + case BUILT_IN_BSWAP32: + return 32; + case BUILT_IN_BSWAP64: + return 64; + case BUILT_IN_BSWAP128: + return 128; + case BUILT_IN_CLZ: + case BUILT_IN_CTZ: + case BUILT_IN_FFS: + case BUILT_IN_PARITY: + case BUILT_IN_POPCOUNT: + return TYPE_PRECISION (integer_type_node); + case BUILT_IN_CLZL: + case BUILT_IN_CTZL: + case BUILT_IN_FFSL: + case BUILT_IN_PARITYL: + case BUILT_IN_POPCOUNTL: + return TYPE_PRECISION (long_integer_type_node); + case BUILT_IN_CLZLL: + case BUILT_IN_CTZLL: + case BUILT_IN_FFSLL: + case BUILT_IN_PARITYLL: + case BUILT_IN_POPCOUNTLL: + return TYPE_PRECISION (long_long_integer_type_node); + default: + return 0; + } +} diff --git a/gcc/builtins.h b/gcc/builtins.h index e71f40c..cfb48f0 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -124,6 +124,7 @@ extern void expand_ifn_atomic_bit_test_and (gcall *); extern void expand_ifn_atomic_compare_exchange (gcall *); extern rtx expand_builtin (tree, rtx, rtx, machine_mode, int); extern enum built_in_function builtin_mathfn_code (const_tree); +extern unsigned int get_builtin_precision (enum built_in_function); extern tree fold_builtin_expect (location_t, tree, tree, tree, tree); extern bool avoid_folding_inline_builtin (tree); extern tree fold_call_expr (location_t, tree, bool); diff --git a/gcc/match.pd b/gcc/match.pd index 39fb57e..d5a7fbf 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3610,7 +3610,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (complex (convert:itype @0) (negate (convert:itype @1))))) /* BSWAP simplifications, transforms checked by gcc.dg/builtin-bswap-8.c. */ -(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 BUILT_IN_BSWAP64) +(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 + BUILT_IN_BSWAP64 BUILT_IN_BSWAP128) (simplify (bswap (bswap @0)) @0) @@ -3620,8 +3621,69 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (for bitop (bit_xor bit_ior bit_and) (simplify (bswap (bitop:c (bswap @0) @1)) - (bitop @0 (bswap @1))))) - + (bitop @0 (bswap @1)))) + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2. */ + (simplify + (bit_and (convert1? (rshift@0 (convert2? (bswap @1)) INTEGER_CST@2)) + INTEGER_CST@3) + (if (tree_fits_uhwi_p (@2) + && tree_fits_uhwi_p (@3)) + (with + { + enum built_in_function fncode = as_builtin_fn (bswap); + unsigned HOST_WIDE_INT prec = get_builtin_precision (fncode); + unsigned HOST_WIDE_INT bits = tree_to_uhwi (@2); + unsigned HOST_WIDE_INT mask = tree_to_uhwi (@3); + unsigned HOST_WIDE_INT lo = bits & 7; + unsigned HOST_WIDE_INT hi = bits - lo; + } + (if (bits < prec + && mask < (256u>>lo) + && bits < TYPE_PRECISION (TREE_TYPE(@0))) + (with { unsigned HOST_WIDE_INT ns = (prec - (hi + 8)) + lo; } + (if (ns == 0) + (bit_and (convert @1) @3) + (with + { + tree utype = unsigned_type_for (TREE_TYPE (@1)); + tree nst = build_int_cst (integer_type_node, ns); + } + (bit_and (convert (rshift:utype (convert:utype @1) {nst;})) @3)))))))) + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2. */ + (simplify + (rshift (convert? (bswap @0)) INTEGER_CST@1) + (if (tree_fits_uhwi_p (@1)) + (with + { + enum built_in_function fncode = as_builtin_fn (bswap); + unsigned HOST_WIDE_INT prec = get_builtin_precision (fncode); + unsigned HOST_WIDE_INT bits = tree_to_uhwi (@1); + } + (if (bits + 8 == prec) + (if (TYPE_UNSIGNED (type)) + (convert (convert:unsigned_char_type_node @0)) + (convert (convert:signed_char_type_node @0))) + (if (bits < prec && bits + 8 > prec) + (with + { + tree nst = build_int_cst (integer_type_node, bits & 7); + tree bt = TYPE_UNSIGNED (type) ? unsigned_char_type_node + : signed_char_type_node; + } + (convert (rshift:bt (convert:bt @0) {nst;})))))))) + /* bswap(x) & C1 can sometimes be simplified to (x >> C2) & C1. */ + (simplify + (bit_and (convert? (bswap @0)) INTEGER_CST@1) + (if (tree_fits_uhwi_p (@1) + && tree_to_uhwi (@1) < 256) + (with + { + enum built_in_function fncode = as_builtin_fn (bswap); + unsigned HOST_WIDE_INT prec = get_builtin_precision (fncode); + tree utype = unsigned_type_for (TREE_TYPE (@0)); + tree nst = build_int_cst (integer_type_node, prec - 8); + } + (bit_and (convert (rshift:utype (convert:utype @0) {nst;})) @1))))) /* Combine COND_EXPRs and VEC_COND_EXPRs. */