From patchwork Mon Feb 27 12:33:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1748681 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ViN3075I; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PQKh045y4z1yWw for ; Mon, 27 Feb 2023 23:34:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 400FA385B535 for ; Mon, 27 Feb 2023 12:34:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 400FA385B535 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677501278; bh=iD5HgKqwcfTa3e2vhFXGDssh4TEOzpsL+jGijXxEjvQ=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ViN3075IHT1e/yjTQ+8IFZIXv1SJExJ5I1s4Ntqmm9ZfudfJfJHgl/v8s16WHSFgT y801DiNyWrZGrYfowETxt06of6JNLFOH6bOUArs3r+QN3kMwMpzZz9y3xRU3X/XthW FGaMXXGPsp6EXWSK0wls4wSn2Y7WA9yJwjPB5hTE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03on2085.outbound.protection.outlook.com [40.107.105.85]) by sourceware.org (Postfix) with ESMTPS id 722A13858C78 for ; Mon, 27 Feb 2023 12:34:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 722A13858C78 Received: from AM7PR04CA0014.eurprd04.prod.outlook.com (2603:10a6:20b:110::24) by AM0PR08MB5363.eurprd08.prod.outlook.com (2603:10a6:208:188::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6134.29; Mon, 27 Feb 2023 12:34:06 +0000 Received: from AM7EUR03FT012.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:110:cafe::b6) by AM7PR04CA0014.outlook.office365.com (2603:10a6:20b:110::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6134.29 via Frontend Transport; Mon, 27 Feb 2023 12:34:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT012.mail.protection.outlook.com (100.127.141.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.12 via Frontend Transport; Mon, 27 Feb 2023 12:34:06 +0000 Received: ("Tessian outbound 3ad958cd7492:v132"); Mon, 27 Feb 2023 12:34:06 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 51c28ceb651e6041 X-CR-MTA-TID: 64aa7808 Received: from f49dae471dd4.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 9C76B3EB-4332-4EE7-9D4C-2F561AAE807A.1; Mon, 27 Feb 2023 12:34:00 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id f49dae471dd4.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 27 Feb 2023 12:34:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k3Mqbhz/rR+vlK0RBZk6N56R6RNovc0wJd4yu5FSrOZvRrlgZgZ4ORR0JcQJm9nMU7taLPjU78arHekvz0c1JevDS0JKr9c89dmACOeww+yEoiTcQtDmILRNptkg3uLdRaSHnPBhTRwwA02NbqoNxLnInmbU6wn4N7RnIf/+CJv+tFLrLlW2ctYRFjnys8ERN8CFE20c76YYOjw093EXk//VqnitcDAIRMIBfdj59NXC5SIKN11YMZAFpXHGfBH1WgURrJP8IUodqtg9E3Ihvyd07q9soNInrqmZcHpUgeSHDHTx8zDR3VKyRTTX7mDf57b4qWaFqHKhUhRjkX/R5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iD5HgKqwcfTa3e2vhFXGDssh4TEOzpsL+jGijXxEjvQ=; b=RBREMy0MXB62CsEWMEwMk8Zw4ZWE0TrM5h/4uZr+7j4Dz6+RBp9o2sk11BwKkmzf5PxXgTZI4CpRDyphQTPrSp7ois+2P0tLyFuCLgoZkN3PllHXPfXbwJEbolAq0iv4LDL7ifhYEAfHh6Jan7N37VwPpsDwHYIVfzeiwdApOVgoHVnszZfraI1hQg2P5MP7CBf9sChKhYBluJyHbnyuh+brZDWU4rllmqcUgsGf2EkJ9wZHq/oanCvIAMhmuptcaeAVyTqEewVGa8j+KoTRTbY5Fo0Ijnl3pqIi0FkrVBAFWpBgcUQdLn1dp7gWmkqlCNPYxJ+DlYGSf2zrVhkiKQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAVPR08MB9844.eurprd08.prod.outlook.com (2603:10a6:102:2f8::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6134.21; Mon, 27 Feb 2023 12:33:57 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::210c:d369:23f7:84fe]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::210c:d369:23f7:84fe%6]) with mapi id 15.20.6134.027; Mon, 27 Feb 2023 12:33:57 +0000 Date: Mon, 27 Feb 2023 12:33:55 +0000 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com, richard.sandiford@arm.com Subject: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583] Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0308.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:197::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAVPR08MB9844:EE_|AM7EUR03FT012:EE_|AM0PR08MB5363:EE_ X-MS-Office365-Filtering-Correlation-Id: 3d0ed44d-1eff-4fd5-d63c-08db18beead4 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 4TpFYg1LHZaVa90I1tw/U8acYcdYOU95saAubY1lPSN8TrqAtrCrPc8lFqpe+mF46aNd8xz4SAbzNPxlc1Qk/vetBIdPCRjF3nRyUMRk72EzvdD9St4HG0x/rrVpkJqQcQxI2SHguT/UubqkHq9Up5T6QUyFuqsac6rJIy8jPUQOQZ18wRov87nNZBqLuWN7bMYnUiAu67pZnie9tPfI8Xgb6bvgB+TfHPbcHDZIRb6+YJk64M/l6jiUpEEKsCMnfSMuHFQf87pyrzpGMJvCbi4ZFR824MHNhgLnIDo+SXa8fcgjqRAjvxqzYyqpRyAjFjut2S5uDOFsOSquv32JCmmB2/Ne4hV74VapSsmKz/YI60saDG8y8kf4x628jDgXyPoUDzG0VQR/yJG7MbazWJEPybYypf038jVVt59vVH/ylViwzAkTieZlbgoqf1opwQKLG8vbO3uYErQ/4ihq6ezkm+rXjY8artdBPM5ArcW75pUrwMXNLkDp+Bte9ydExA4VXnyLOr5hHtHkJQHDgLPovH5NW6T/bjIv1xQfGO5vnyIHDC5uZwLIS9lmqg2jXVcDqy3wMUF5PhSJazTIF++nlUWsvkTNRW6aE2TsGMPpH43MpLdbY9TscYiQ/mQ47EW5xnKERIOhLeA5FpRCdDbTD9Z6fY6Y/Fw1puWTm6Z9cEIlycZNLsomBe3EfreqN7QLqoc9kIxwh1UThEFTncSX+BDlLYRT56uod2ff5T8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(4636009)(376002)(366004)(396003)(136003)(346002)(39860400002)(451199018)(478600001)(44832011)(30864003)(38100700002)(2906002)(84970400001)(86362001)(44144004)(26005)(36756003)(33964004)(186003)(6486002)(4743002)(83380400001)(66556008)(66476007)(41300700001)(4326008)(8936002)(66946007)(6512007)(6916009)(2616005)(6506007)(316002)(235185007)(8676002)(5660300002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9844 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 2691e018-e689-4187-d76a-08db18bee566 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8UjXov3FGuBXnasNGN7g1nFV0RwGH9VmtBYp+Sxcg+P0JfsdSdir80ZZrsEc8JOcXlvIM0y5vmJS3JYFhO//m8d2le59CimEgzLngaQeUN+S3B11Cv4cls8Ybo0Kdd0Kes7RhW7bH3d19D3cEEo1j5sd3PEyt/gcAygyjADcmoJOezuIOL1Kx+VKsbZGoTeIgXiYG069a4OnNEYK37kY1810iLgpMNh77ex9t4u8p9XF8e4D5PCKOl2cWcfXT4wxl8akjo8xOURydNXQZrDp/YUBCRQ6PGuX6GMXHPzANW2ZvzAaYAtq0kXQCvThMNCxv+IPy2xsw3h/TljVr9qLCiuKkossHai9d86Ax0vTFSCFZmOvaO93YF2D1jF/ULlfvw5TBAJEMLIzKzrTiOpMjKvKQELjq0jLvovsyUPU+RzuLgc5LfWexunSxyasA4zFtfj1/LO5a39XPBdrKEIKpuOM9+o8up3Vh3HVn4Z/K6NrXuLX4xK6fORLPGtPB5jF8xjN/mMj3S6IivcuTE210ex8/1K0xdrgnpNV3GIdm54w5lb9+wkq8Vvncfc6ady+dbRoA5mjLf+fwww+9RxnAO28j6ZuZ8TGdDB3LO1SW47wOOsfkqA1tJzD29PaW2u7u4gZUJ0iGMiTxvGykx50YwUX7/YblITjvC0du2JzkxQBfU70OdakLOsT5SdIUQmBAWqL/JEmr51G2PLrWZERMF66NVkb3envG08DpfOx6XU= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230025)(4636009)(396003)(376002)(346002)(39850400004)(136003)(451199018)(36840700001)(46966006)(235185007)(44832011)(30864003)(5660300002)(36756003)(8936002)(40480700001)(86362001)(47076005)(478600001)(82310400005)(2616005)(83380400001)(6486002)(336012)(4743002)(26005)(186003)(6512007)(44144004)(33964004)(6506007)(70206006)(70586007)(41300700001)(356005)(8676002)(4326008)(6916009)(36860700001)(82740400003)(81166007)(316002)(84970400001)(2906002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Feb 2023 12:34:06.4945 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3d0ed44d-1eff-4fd5-d63c-08db18beead4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5363 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, As Richard S wanted, this now implements a hook preferred_div_as_shifts_over_mult that indicates whether a target prefers that the vectorizer decomposes division as shifts rather than multiplication when possible. In order to be able to use this we need to check whether the current precision has enough bits to do the operation without any of the additions overflowing. We use range information to determine this and only do the operation if we're sure am overflow won't occur. This now uses ranger to do this range check. This seems to work better than vect_get_range_info which uses range_query, but I have not switched the interface of vect_get_range_info over in this PR fix. As Andy said before initializing a ranger instance is cheap but not free, and if the intention is to call it often during a pass it should be instantiated at pass startup and passed along to the places that need it. This is a big refactoring and doesn't seem right to do in this PR. But we should in GCC 14. Currently we only instantiate it after a long series of much cheaper checks. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/108583 * target.def (preferred_div_as_shifts_over_mult): New. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * targhooks.cc (default_preferred_div_as_shifts_over_mult): New. * targhooks.h (default_preferred_div_as_shifts_over_mult): New. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it. gcc/testsuite/ChangeLog: PR target/108583 * gcc.dg/vect/vect-div-bitmask-4.c: New test. * gcc.dg/vect/vect-div-bitmask-5.c: New test. --- inline copy of patch -- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 50a8872a6695b18b9bed0d393bacf733833633db..c85196015e2e53047fcc65d32ef2d3203d2a6bab 100644 --- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 50a8872a6695b18b9bed0d393bacf733833633db..c85196015e2e53047fcc65d32ef2d3203d2a6bab 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6137,6 +6137,9 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (void) +When decomposing a division operation, if possible prefer to decompose the +operation as shifts rather than multiplication by magic constants. @end deftypefn @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4173,6 +4173,7 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION diff --git a/gcc/target.def b/gcc/target.def index e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..8cc18b1f3c5de24c21faf891b9d4d0b6fd5b59d7 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1868,6 +1868,15 @@ correct for most targets.", poly_uint64, (const_tree type), default_preferred_vector_alignment) +/* Returns whether the target has a preference for decomposing divisions using + shifts rather than multiplies. */ +DEFHOOK +(preferred_div_as_shifts_over_mult, + "When decomposing a division operation, if possible prefer to decompose the\n\ +operation as shifts rather than multiplication by magic constants.", + bool, (void), + default_preferred_div_as_shifts_over_mult) + /* Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type. */ DEFHOOK diff --git a/gcc/targhooks.h b/gcc/targhooks.h index a6a4809ca91baa5d7fad2244549317a31390f0c2..dda011c59fbd5973ee648dfea195619cc41c71bc 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void); extern unsigned HOST_WIDE_INT default_shift_truncation_mask (machine_mode); extern unsigned int default_min_divisions_for_recip_mul (machine_mode); +extern bool +default_preferred_div_as_shifts_over_mult (void); extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode); extern tree default_stack_protect_guard (void); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index 211525720a620d6f533e2da91e03877337a931e7..6396f344eef09dd61f358938846a1c02a70b31d8 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type) return TYPE_ALIGN (type); } +/* The default implementation of + TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT. */ + +bool +default_preferred_div_as_shifts_over_mult (void) +{ + return false; +} + /* By default assume vectors of element TYPE require a multiple of the natural alignment of TYPE. TYPE is naturally aligned if IS_PACKED is false. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..c81f8946922250234bf759e0a0a04ea8c1f73e3c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +typedef unsigned __attribute__((__vector_size__ (16))) V; + +static __attribute__((__noinline__)) __attribute__((__noclone__)) V +foo (V v, unsigned short i) +{ + v /= i; + return v; +} + +int +main (void) +{ + V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }, 0xffff); + for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++) + if (v[i] != 0x00010001) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c new file mode 100644 index 0000000000000000000000000000000000000000..b4eb1a4dacba481e6306b49914d2a29b933de625 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c @@ -0,0 +1,58 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] + level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] + level) / 0xff; +} + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + +/* { dg-final { scan-tree-dump "divmod pattern recognized" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1766ce277d6b88d8aa3be77e7c8abb504a10a735..31f2a6753b4faccb77351c8c5afed9775888b60f 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3913,6 +3913,84 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((cst = uniform_integer_cst_p (oprnd1)) + && TYPE_UNSIGNED (itype) + && rhs_code == TRUNC_DIV_EXPR + && vectype + && targetm.vectorize.preferred_div_as_shifts_over_mult ()) + { + /* div optimizations using narrowings + we can do the division e.g. shorts by 255 faster by calculating it as + (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in + double the precision of x. + + If we imagine a short as being composed of two blocks of bytes then + adding 257 or 0b0000_0001_0000_0001 to the number is equivalent to + adding 1 to each sub component: + + short value of 16-bits + ┌──────────────┬────────────────┐ + │ │ │ + └──────────────┴────────────────┘ + 8-bit part1 ▲ 8-bit part2 ▲ + │ │ + │ │ + +1 +1 + + after the first addition, we have to shift right by 8, and narrow the + results back to a byte. Remember that the addition must be done in + double the precision of the input. However if we know that the addition + `x + 257` does not overflow then we can do the operation in the current + precision. In which case we don't need the pack and unpacks. */ + auto wcst = wi::to_wide (cst); + int pow = wi::exact_log2 (wcst + 1); + if (pow == (int) (element_precision (vectype) / 2)) + { + gimple *stmt = SSA_NAME_DEF_STMT (oprnd0); + + gimple_ranger ranger; + int_range_max r; + + /* Check that no overflow will occur. If we don't have range + information we can't perform the optimization. */ + + if (ranger.range_of_expr (r, oprnd0, stmt)) + { + wide_int max = r.upper_bound (); + wide_int one = wi::to_wide (build_one_cst (itype)); + wide_int adder = wi::add (one, wi::lshift (one, pow)); + wi::overflow_type ovf; + wi::add (max, adder, UNSIGNED, &ovf); + if (ovf == wi::OVF_NONE) + { + *type_out = vectype; + tree tadder = wide_int_to_tree (itype, adder); + tree rshift = wide_int_to_tree (itype, pow); + + tree new_lhs1 = vect_recog_temp_ssa_var (itype, NULL); + gassign *patt1 + = gimple_build_assign (new_lhs1, PLUS_EXPR, oprnd0, tadder); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs2 = vect_recog_temp_ssa_var (itype, NULL); + patt1 = gimple_build_assign (new_lhs2, RSHIFT_EXPR, new_lhs1, + rshift); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs3 = vect_recog_temp_ssa_var (itype, NULL); + patt1 = gimple_build_assign (new_lhs3, PLUS_EXPR, new_lhs2, + oprnd0); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs4 = vect_recog_temp_ssa_var (itype, NULL); + pattern_stmt = gimple_build_assign (new_lhs4, RSHIFT_EXPR, + new_lhs3, rshift); + + return pattern_stmt; + } + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6137,6 +6137,9 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (void) +When decomposing a division operation, if possible prefer to decompose the +operation as shifts rather than multiplication by magic constants. @end deftypefn @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4173,6 +4173,7 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION diff --git a/gcc/target.def b/gcc/target.def index e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..8cc18b1f3c5de24c21faf891b9d4d0b6fd5b59d7 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1868,6 +1868,15 @@ correct for most targets.", poly_uint64, (const_tree type), default_preferred_vector_alignment) +/* Returns whether the target has a preference for decomposing divisions using + shifts rather than multiplies. */ +DEFHOOK +(preferred_div_as_shifts_over_mult, + "When decomposing a division operation, if possible prefer to decompose the\n\ +operation as shifts rather than multiplication by magic constants.", + bool, (void), + default_preferred_div_as_shifts_over_mult) + /* Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type. */ DEFHOOK diff --git a/gcc/targhooks.h b/gcc/targhooks.h index a6a4809ca91baa5d7fad2244549317a31390f0c2..dda011c59fbd5973ee648dfea195619cc41c71bc 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void); extern unsigned HOST_WIDE_INT default_shift_truncation_mask (machine_mode); extern unsigned int default_min_divisions_for_recip_mul (machine_mode); +extern bool +default_preferred_div_as_shifts_over_mult (void); extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode); extern tree default_stack_protect_guard (void); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index 211525720a620d6f533e2da91e03877337a931e7..6396f344eef09dd61f358938846a1c02a70b31d8 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type) return TYPE_ALIGN (type); } +/* The default implementation of + TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT. */ + +bool +default_preferred_div_as_shifts_over_mult (void) +{ + return false; +} + /* By default assume vectors of element TYPE require a multiple of the natural alignment of TYPE. TYPE is naturally aligned if IS_PACKED is false. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..c81f8946922250234bf759e0a0a04ea8c1f73e3c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +typedef unsigned __attribute__((__vector_size__ (16))) V; + +static __attribute__((__noinline__)) __attribute__((__noclone__)) V +foo (V v, unsigned short i) +{ + v /= i; + return v; +} + +int +main (void) +{ + V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }, 0xffff); + for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++) + if (v[i] != 0x00010001) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c new file mode 100644 index 0000000000000000000000000000000000000000..b4eb1a4dacba481e6306b49914d2a29b933de625 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c @@ -0,0 +1,58 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] + level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] + level) / 0xff; +} + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + +/* { dg-final { scan-tree-dump "divmod pattern recognized" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1766ce277d6b88d8aa3be77e7c8abb504a10a735..31f2a6753b4faccb77351c8c5afed9775888b60f 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3913,6 +3913,84 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((cst = uniform_integer_cst_p (oprnd1)) + && TYPE_UNSIGNED (itype) + && rhs_code == TRUNC_DIV_EXPR + && vectype + && targetm.vectorize.preferred_div_as_shifts_over_mult ()) + { + /* div optimizations using narrowings + we can do the division e.g. shorts by 255 faster by calculating it as + (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in + double the precision of x. + + If we imagine a short as being composed of two blocks of bytes then + adding 257 or 0b0000_0001_0000_0001 to the number is equivalent to + adding 1 to each sub component: + + short value of 16-bits + ┌──────────────┬────────────────┐ + │ │ │ + └──────────────┴────────────────┘ + 8-bit part1 ▲ 8-bit part2 ▲ + │ │ + │ │ + +1 +1 + + after the first addition, we have to shift right by 8, and narrow the + results back to a byte. Remember that the addition must be done in + double the precision of the input. However if we know that the addition + `x + 257` does not overflow then we can do the operation in the current + precision. In which case we don't need the pack and unpacks. */ + auto wcst = wi::to_wide (cst); + int pow = wi::exact_log2 (wcst + 1); + if (pow == (int) (element_precision (vectype) / 2)) + { + gimple *stmt = SSA_NAME_DEF_STMT (oprnd0); + + gimple_ranger ranger; + int_range_max r; + + /* Check that no overflow will occur. If we don't have range + information we can't perform the optimization. */ + + if (ranger.range_of_expr (r, oprnd0, stmt)) + { + wide_int max = r.upper_bound (); + wide_int one = wi::to_wide (build_one_cst (itype)); + wide_int adder = wi::add (one, wi::lshift (one, pow)); + wi::overflow_type ovf; + wi::add (max, adder, UNSIGNED, &ovf); + if (ovf == wi::OVF_NONE) + { + *type_out = vectype; + tree tadder = wide_int_to_tree (itype, adder); + tree rshift = wide_int_to_tree (itype, pow); + + tree new_lhs1 = vect_recog_temp_ssa_var (itype, NULL); + gassign *patt1 + = gimple_build_assign (new_lhs1, PLUS_EXPR, oprnd0, tadder); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs2 = vect_recog_temp_ssa_var (itype, NULL); + patt1 = gimple_build_assign (new_lhs2, RSHIFT_EXPR, new_lhs1, + rshift); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs3 = vect_recog_temp_ssa_var (itype, NULL); + patt1 = gimple_build_assign (new_lhs3, PLUS_EXPR, new_lhs2, + oprnd0); + append_pattern_def_seq (vinfo, stmt_vinfo, patt1, vectype); + + tree new_lhs4 = vect_recog_temp_ssa_var (itype, NULL); + pattern_stmt = gimple_build_assign (new_lhs4, RSHIFT_EXPR, + new_lhs3, rshift); + + return pattern_stmt; + } + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1))