From patchwork Fri Dec 29 14:41:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1881091 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=togOCKXR; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=togOCKXR; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4T1p3z636Qz20R0 for ; Sat, 30 Dec 2023 01:41:50 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E7D863858C5E for ; Fri, 29 Dec 2023 14:41:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2068.outbound.protection.outlook.com [40.107.249.68]) by sourceware.org (Postfix) with ESMTPS id CC2F33858D33 for ; Fri, 29 Dec 2023 14:41:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CC2F33858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CC2F33858D33 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.249.68 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1703860888; cv=pass; b=UvVOn5nmDzVdSeqAZOesWE1DE4A1FQhstYq+PYrdNIBCLUPHeTXr1wE+Kmp7jPsb1E8JjW3SC3D6TCeQWxuPIz3AbcMJU+ZHIH+H3D2H0utqitmBst9fd/pyQwN5IrphRU4ctyL5wo0JibPcaZNJU5M6N2p1wNAwNfWeHDun18s= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1703860888; c=relaxed/simple; bh=WRobjLYuNeqpIdkQpf9qB3urHVWJHp5pC/5O4AqoaBc=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=vLMkRURZtzzo6b2zj9VPg8+lOmjtsx2AULs33+aryNDQ8tl8vaVahbeHa6j2B/vyGbWuwMVaP1jYntfjAEX5BL4sMtcBx2Hy2PNgN9i1reslP4zGhis5ynRuYVVjHaOnT++liB9RV9Xyq7GxgQtEVtYwZcExxdXLDCli2NxgbtY= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=QhOAr/4zoMbCXAn+rh+RLxZQBYORyujBLUFtjM28tyyqN4HQTnhZo1Xuw/0yaiEs+PQdqZpNUEKGKD2q7Qicxt1kYEAqeZHqCeQi6yiWJJer1f/1P/SieO4hfGZw1ggnYOZbOlNEqIzq9q6lwrvFlNdg8QxmgZgNoM8KLcVWWGTU1o0LWlS+Dsob1+lHStxjWamyniBDxX4dj1oKiRFUbMg30+Uu2GRUTiFWK4HINmnEc9f98Zo+yL99gihvGu7fsYyqYmcllhKHHFgNI48NJKlrmWepb23ZJd8qPnB0gLX4fx8JTDZolPT+3sPqFTA8r4vrk3gQqss5qexcYTfRfA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=u1i0t7/zGlXwTErTRgYT2uoNfX+rYasttDKyi/51F6w=; b=YkZKe503W1GoFVZor1DVKmGvDlOSPu/+5z1eYDO/CuccOzHG8m+pf1PqUAyp5Vz0upnxYI+QmbwBmyJAbfdPcgxzzbmWwtkJOGk1mHNh+2rsYnoHgWEx+L+c28Gs0G4cWGEmTRVEYChVyMU626YZfwUq7Q4gFXZeODwoq3YdgyA/5y2AZgNPOvgjCddE0uPTyn9waUg3E5/9Nk9gyyiQ0EyHNpVhZTZ3QKVNoO0h9B774XCKMJqtvs3Recd7qfPi46sbPz5FWGSnMQM0I6aI/CMDQq7NQWqvJCMZfazxI5c0d/vXktdf+z5qVAhxWed22co1+q8cnepSmafOZnRd2g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=u1i0t7/zGlXwTErTRgYT2uoNfX+rYasttDKyi/51F6w=; b=togOCKXRAoGlwp3SxXASvQgiFO84Qlrp9yXuA1u44cp3q8RStMB3DxhkvAicjHKY0/zVZN2OjTNVsVBUlgg8z9iyMl5DV9MBaRNJAcbgfIHsqUa6n+dnNsq43McRw9AUeFob7KMVbIkuerYC2Ltx/xHOQ+/2xlJ65j9ItgRcCrk= Received: from DB7PR02CA0028.eurprd02.prod.outlook.com (2603:10a6:10:52::41) by DU0PR08MB8977.eurprd08.prod.outlook.com (2603:10a6:10:465::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7135.21; Fri, 29 Dec 2023 14:41:22 +0000 Received: from DB1PEPF0003922D.eurprd03.prod.outlook.com (2603:10a6:10:52:cafe::88) by DB7PR02CA0028.outlook.office365.com (2603:10a6:10:52::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7135.20 via Frontend Transport; Fri, 29 Dec 2023 14:41:22 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF0003922D.mail.protection.outlook.com (10.167.8.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.9 via Frontend Transport; Fri, 29 Dec 2023 14:41:22 +0000 Received: ("Tessian outbound 7671e7ddc218:v228"); Fri, 29 Dec 2023 14:41:22 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 148d941b55033290 X-CR-MTA-TID: 64aa7808 Received: from 1f7511f674c8.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6C92381D-F132-450A-BF38-63972B7042B7.1; Fri, 29 Dec 2023 14:41:15 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 1f7511f674c8.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 29 Dec 2023 14:41:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QSsDE0X2xcq8HZUcpHUm5PXz4LRPSU29s3UwvA18RD5cNAzHqcTOERpPw8Kaxli2qWBqT0mXkJY4hxE6N5mfsAU2qerOYorE25W2N+QUmvYeqbUfPW4xoDgMh1sONDUh4JGdCaafViEXZjrq/NCcYhmW2I7OzXgmYSE7hAbNrOmfLAkeUDQhSh+S6z03pE0AwtqQBTlz6pGQs8YegcgIbtTC4QX0PqPZ7qB3wwuvLhpVdf3zrjwfeEnYW/zsnwC0jXNAV5oKymKB+r0xgcbDq2dTD+JLHGhH6Zu18oVzbeLJpYlNArKNd6ukJ1LXvijEM3oEN3INaqsUfYuFWZrc1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=u1i0t7/zGlXwTErTRgYT2uoNfX+rYasttDKyi/51F6w=; b=JR/1Z4Qq7IwPbbhKKW0F0DiZh9zDB/y7cFp55fDUVJO5Qer3PVJMrFCK8oPK6FQ55yHC9k+zm+RisGdBqYiyJ3r4xqFkBAzAyiOBfj2MnHkbjq4jGXGp+WGB30HWdTST7+8kb7+N1BC7RWV8b/EAuL3aUHnfTGf2Muoe7D8s1vsQKEqwk2b1qomcmVGeZQKhqxS40K5N6/WswImSEsc2XfCxSKl/cO4i43MGdMg/Uf/2NsOZi5co0kNji36vNhiZCviO6Mu7S3ykQSjoCCR3WRn44D7ic5yJm09Q7rYAMpF/U8LGcfyUHrnlB20DRQJW5dBwxAUqfWOUJmz5M3YKdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=u1i0t7/zGlXwTErTRgYT2uoNfX+rYasttDKyi/51F6w=; b=togOCKXRAoGlwp3SxXASvQgiFO84Qlrp9yXuA1u44cp3q8RStMB3DxhkvAicjHKY0/zVZN2OjTNVsVBUlgg8z9iyMl5DV9MBaRNJAcbgfIHsqUa6n+dnNsq43McRw9AUeFob7KMVbIkuerYC2Ltx/xHOQ+/2xlJ65j9ItgRcCrk= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAXPR08MB6653.eurprd08.prod.outlook.com (2603:10a6:102:15f::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7135.21; Fri, 29 Dec 2023 14:41:12 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7135.019; Fri, 29 Dec 2023 14:41:12 +0000 Date: Fri, 29 Dec 2023 14:41:08 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64 Update costing for vector conversions [PR110625] Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO2P265CA0433.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:e::13) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAXPR08MB6653:EE_|DB1PEPF0003922D:EE_|DU0PR08MB8977:EE_ X-MS-Office365-Filtering-Correlation-Id: afe41b5d-e6cc-4488-93ae-08dc087c3a21 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: A2TxHLmXThJrhUk2nBtUS0ziyR3TbMRtdNbiA06WDvIi/84j7r280nMwLZeCs0UbSpR23wj3KzmlRBQVmMsjybHIKKl398QjLxz9Qw2dEfdt72VXaeXAWd72Lst5OXynp7CpbWwVAJrkE+jY87bcHAPMl4QMXLx9sjD1wknxZBuGgCbeENy6IviI03rJUh+h/z7IXmHViDtxe6RVmT4LXbMqCJCeXiXpODN8ZyoSsjla4aIPMi4+Me5PX+IKvmPql3p8iVeONI0A4pzQ00WStyOJHkkOcbAyFXYRHmNpPeit4uD7DCRtoYQS53tYcRFjgs92cJhNZRS38NFpDfTAR4u48qr7z6Ql5r16MX7mAgevEJ8Njlx5QnlGxOUrdpI/44VMHDWUx2XQhfHkSpjd/MTTurjjwepjrmvlUkR1NcVo/en7IAI62Sc2N3LjSUCXLFhVVoS0+rRzLH3MSIGJE9t0iZ+aCtBcrWMJ2qL8BdeZZ18THhLco5azoePwS2nMn9einq21N5nNe6MA8a8YmXXPEgcxwKA96IBvNdG5ZWVQrMzTzXEtZ9mk8BtVlS2QkoTSzd8YeWtNvdMKrkRm4C3Z74ion6Qi9LeTcPUrnpjXRGor/Q3pt9chNicjfFLZl1WB30vI+xSNBQ/yWapEk2b+1zrCSz+9MzzuheU2HOY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(376002)(366004)(346002)(136003)(396003)(39860400002)(230922051799003)(1800799012)(451199024)(186009)(64100799003)(44832011)(5660300002)(235185007)(15650500001)(8676002)(8936002)(4326008)(2906002)(478600001)(6486002)(44144004)(6506007)(6512007)(33964004)(6666004)(6916009)(66476007)(66556008)(66946007)(316002)(41300700001)(86362001)(2616005)(26005)(36756003)(38100700002)(83380400001)(4743002)(84970400001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6653 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF0003922D.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c1dcc5af-4e77-4086-a426-08dc087c33b8 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YkxUHf3fjpFwNgYRVII5gQAe2++Q+hk+wgjKE60H4evbBsiwap3QAQrMoML9ngkgoCDFj01sN9tZLCjgXPHGfsKGzl/xVozRHwQr6bmKwSmUF37Q59o0Orkewb1oQUv1zj77Y+Gj3xA265FYS6o2P+/MeV35mUOg++7i/aQuYf5ITPraHEBIjJp+CloauqvTmToEOj/XEW6Fw6LV5r9S+VXjaFdlsPRsBQ+n3j7P+WI95HOJdCREJdBs8P8HmkBq8yuUUPHbSrC0sVhe2eneSyIiWPoBAKviLodKLyTzU0f5p6eEauVUK8XrdJikqQQUk+8mkWzcrZ/khGrK8U/Ez03UGsl8lICU9u9dHZ4qd+8QO5o2qiRd49ex9fHFkrdki8PjZiIvOFuJudjVFC87Z47CadTogFFF9bBUbolM7poa6HW9YvZT0SblE6XLQnaMoH5c3zov5AtnDhXk2xoWhQGR2tt4Gy+CpqSHvTFd19c8oR9n0+dLt05gqD5lL95X7+BdW+Exq3r72d6ekuQt6ZnQK0Iq6qKr05SY5jjkEyNn4kksc6kABpYUplT2vW2AvjZMYrSXflo+9DqCjJ2PZJHMvoEoTYcZIBrhAiNbCkhB9Rh3jx5cbwo38rJKLbBg2OsphKXJwg84P7udtdQPa85ryWOhOQE4p6Qun0y3+Zlfbj9HvGI6GblZY4e3tboo4W6pucRfS/xhWBAFUzizTLsL/jQ2bxnawm+lWZX5Bbuh75H9IeLXtyQHf2XjQLETrzu1hrkwEhA1XPZoxK+LFzpYLNir/YXoaz1umMnrIkcavQXujj9KkeE815qWXmCk X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(136003)(39860400002)(396003)(376002)(346002)(230922051799003)(1800799012)(82310400011)(186009)(64100799003)(451199024)(46966006)(36840700001)(40470700004)(47076005)(40480700001)(26005)(84970400001)(83380400001)(336012)(4743002)(44144004)(6506007)(478600001)(6512007)(33964004)(2616005)(6666004)(44832011)(70586007)(70206006)(36756003)(8936002)(8676002)(316002)(6916009)(40460700003)(4326008)(6486002)(86362001)(81166007)(36860700001)(356005)(82740400003)(15650500001)(5660300002)(235185007)(41300700001)(2906002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Dec 2023 14:41:22.4013 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: afe41b5d-e6cc-4488-93ae-08dc087c3a21 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF0003922D.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8977 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi All, In gimple the operation short _8; double _9; _9 = (double) _8; denotes two operations. First we have to widen from short to long and then convert this integer to a double. Currently however we only count the widen/truncate operations: (double) _5 6 times vec_promote_demote costs 12 in body (double) _5 12 times vec_promote_demote costs 24 in body but not the actual conversion operation, which needs an additional 12 instructions in the attached testcase. Without this the attached testcase ends up incorrectly thinking that it's beneficial to vectorize the loop at a very high VF = 8 (4x unrolled). Because we can't change the mid-end to account for this the costing code in the backend now keeps track of whether the previous operation was a promotion/demotion and ajdusts the expected number of instructions to: 1. If it's the first FLOAT_EXPR and the precision of the lhs and rhs are different, double it, since we need to convert and promote. 2. If it's the previous operation was a demonition/promotion then reduce the cost of the current operation by the amount we added extra in the last. with the patch we get: (double) _5 6 times vec_promote_demote costs 24 in body (double) _5 12 times vec_promote_demote costs 36 in body which correctly accounts for 30 operations. This fixes the regression reported on Neoverse N2 and using the new generic Armv9-a cost model. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Adjust throughput and latency calculations for vector conversions. (class aarch64_vector_costs): Add m_num_last_promote_demote. gcc/testsuite/ChangeLog: PR target/110625 * gcc.target/aarch64/pr110625_4.c: New test. * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Add --param aarch64-sve-compare-costs=0. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f9850320f61c5ddccf47e6583d304e5f405a484f..5622221413e52717974b96f79cc83008f237c536 100644 --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f9850320f61c5ddccf47e6583d304e5f405a484f..5622221413e52717974b96f79cc83008f237c536 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16077,6 +16077,15 @@ private: leaving a vectorization of { elts }. */ bool m_stores_to_vector_load_decl = false; + /* Non-zero if the last operation we costed is a vector promotion or demotion. + In this case the value is the number of insn in the last operation. + + On AArch64 vector promotion and demotions require us to first widen or + narrow the input and only after that emit conversion instructions. For + costing this means we need to emit the cost of the final conversions as + well. */ + unsigned int m_num_last_promote_demote = 0; + /* - If M_VEC_FLAGS is zero then we're costing the original scalar code. - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then we're costing Advanced SIMD code. @@ -17132,6 +17141,29 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, stmt_cost = aarch64_sve_adjust_stmt_cost (m_vinfo, kind, stmt_info, vectype, stmt_cost); + /* Vector promotion and demotion requires us to widen the operation first + and only after that perform the conversion. Unfortunately the mid-end + expects this to be doable as a single operation and doesn't pass on + enough context here for us to tell which operation is happening. To + account for this we count every promote-demote operation twice and if + the previously costed operation was also a promote-demote we reduce + the cost of the currently being costed operation to simulate the final + conversion cost. Note that for SVE we can do better here if the converted + value comes from a load since the widening load would consume the widening + operations. However since we're in stage 3 we can't change the helper + vect_is_extending_load and duplicating the code seems not useful. */ + gassign *assign = NULL; + if (kind == vec_promote_demote + && (assign = dyn_cast (STMT_VINFO_STMT (stmt_info))) + && gimple_assign_rhs_code (assign) == FLOAT_EXPR) + { + auto new_count = count * 2 - m_num_last_promote_demote; + m_num_last_promote_demote = count; + count = new_count; + } + else + m_num_last_promote_demote = 0; + if (stmt_info && aarch64_use_new_vector_costs_p ()) { /* Account for any extra "embedded" costs that apply additively diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_4.c b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c new file mode 100644 index 0000000000000000000000000000000000000000..34dac19d81a85d63706d54f4cb0c738ce592d5d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mcpu=neoverse-n2 -fdump-tree-vect-details" } */ + +typedef struct { + short blue, green, red, opacity; +} Pixel; + +double foo (long n, double *k, Pixel *k_pixels) { + double result_2, result_1, result_0; + for (; n; n++, k--) { + result_0 += *k * k_pixels[n].red; + result_1 += *k * k_pixels[n].green; + result_2 += *k * k_pixels[n].blue; + } + return result_0 + result_1 + result_2; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c index 0f96dc2ff007340541c2ba7d51e1ccfa0f3f2d39..4c5e88657408f61156035012212ed542fac45efb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -fno-inline" } */ +/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c index 70465f91eba4f80140b2059481eb8f06bbc9ace7..3ff2bd127756b2ff08095513b09325db4779ba02 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */ #include --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16077,6 +16077,15 @@ private: leaving a vectorization of { elts }. */ bool m_stores_to_vector_load_decl = false; + /* Non-zero if the last operation we costed is a vector promotion or demotion. + In this case the value is the number of insn in the last operation. + + On AArch64 vector promotion and demotions require us to first widen or + narrow the input and only after that emit conversion instructions. For + costing this means we need to emit the cost of the final conversions as + well. */ + unsigned int m_num_last_promote_demote = 0; + /* - If M_VEC_FLAGS is zero then we're costing the original scalar code. - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then we're costing Advanced SIMD code. @@ -17132,6 +17141,29 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, stmt_cost = aarch64_sve_adjust_stmt_cost (m_vinfo, kind, stmt_info, vectype, stmt_cost); + /* Vector promotion and demotion requires us to widen the operation first + and only after that perform the conversion. Unfortunately the mid-end + expects this to be doable as a single operation and doesn't pass on + enough context here for us to tell which operation is happening. To + account for this we count every promote-demote operation twice and if + the previously costed operation was also a promote-demote we reduce + the cost of the currently being costed operation to simulate the final + conversion cost. Note that for SVE we can do better here if the converted + value comes from a load since the widening load would consume the widening + operations. However since we're in stage 3 we can't change the helper + vect_is_extending_load and duplicating the code seems not useful. */ + gassign *assign = NULL; + if (kind == vec_promote_demote + && (assign = dyn_cast (STMT_VINFO_STMT (stmt_info))) + && gimple_assign_rhs_code (assign) == FLOAT_EXPR) + { + auto new_count = count * 2 - m_num_last_promote_demote; + m_num_last_promote_demote = count; + count = new_count; + } + else + m_num_last_promote_demote = 0; + if (stmt_info && aarch64_use_new_vector_costs_p ()) { /* Account for any extra "embedded" costs that apply additively diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_4.c b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c new file mode 100644 index 0000000000000000000000000000000000000000..34dac19d81a85d63706d54f4cb0c738ce592d5d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mcpu=neoverse-n2 -fdump-tree-vect-details" } */ + +typedef struct { + short blue, green, red, opacity; +} Pixel; + +double foo (long n, double *k, Pixel *k_pixels) { + double result_2, result_1, result_0; + for (; n; n++, k--) { + result_0 += *k * k_pixels[n].red; + result_1 += *k * k_pixels[n].green; + result_2 += *k * k_pixels[n].blue; + } + return result_0 + result_1 + result_2; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c index 0f96dc2ff007340541c2ba7d51e1ccfa0f3f2d39..4c5e88657408f61156035012212ed542fac45efb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -fno-inline" } */ +/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c index 70465f91eba4f80140b2059481eb8f06bbc9ace7..3ff2bd127756b2ff08095513b09325db4779ba02 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */ #include