From patchwork Tue Sep 3 13:07:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1980046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=ucw.cz header.i=@ucw.cz header.a=rsa-sha256 header.s=gen1 header.b=MuUAJk9y; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WymBL73rDz1yfv for ; Tue, 3 Sep 2024 23:07:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 434023860C3B for ; Tue, 3 Sep 2024 13:07:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 6741F385DDCA for ; Tue, 3 Sep 2024 13:07:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6741F385DDCA Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6741F385DDCA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.113.20.16 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725368835; cv=none; b=KkxUNp+XXhBuZoyTj7vpbkE3oix52ByVck8DwjsSXxgVyFeQXuW6wMNRLNU9ABaN9qX5/Rd93as+5EiW/F/FEd3QmA4Bap9e7LMhP0Pbs2reVMd8uwQ1CLpMVc+5gOFOCHXXEECZtDVq42vvBCAMmLtQWPVaVlcfCg35kKS+Sr0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725368835; c=relaxed/simple; bh=+2Fe3FPKhQBFD5K4az9vnK00mFJfyCDIU48IQ8BXBZ4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=D23Um7V6Zl4JQWGbrHZcKr+SP0IZ1nukmfYug9krqfV/3RfmC2jseU9vZ6mkDV3wM5ZtjdBsBiOayP5CYgmJ3s4dBm+ScS8B3JYmXoX1zR3KWlzwRb0cZXbpeWL0hfP+6beqD/1rogBuuyIO7BbSDj6aW8nT3Dud14aMsNHTy4o= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id E28D32879AF; Tue, 3 Sep 2024 15:07:10 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1725368830; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=AmTgmzBI0qBYUuknV7HJ1uhYxA1r5U7K3E/4Id7QLiA=; b=MuUAJk9yAGWsUzNML56sZmm9ClhekKWs+OzhwPb9kg8wxCcIbJDV/InY5ede9sQPauP7Xe faS014qVmKhSV0yVRz2S2Df53qNZFKqZf6BKruEB5lumVUn7/Sov/7xH2f8lEEBnnklXoa u0f6Tltc+CL4UbeCVgKC9DjFzsLYdMY= Date: Tue, 3 Sep 2024 15:07:10 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, mjambor@suse.cz Subject: Zen5 tuning part 2: disable gather and scatter Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, JMQ_SPF_NEUTRAL, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses Bootstrapped/regtested x86_64-linux, committed. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index da1a3d6a3c6..ed26136faee 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -476,35 +476,35 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes", /* X86_TUNE_USE_GATHER_2PARTS: Use gather instructions for vectors with 2 elements. */ DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts", - ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID + ~(m_ZNVER | m_CORE_HYBRID | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS)) /* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2 elements. */ DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts", - ~(m_ZNVER4)) + ~(m_ZNVER4 | m_ZNVER5)) /* X86_TUNE_USE_GATHER_4PARTS: Use gather instructions for vectors with 4 elements. */ DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts", - ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID + ~(m_ZNVER | m_CORE_HYBRID | m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS)) /* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4 elements. */ DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts", - ~(m_ZNVER4)) + ~(m_ZNVER4 | m_ZNVER5)) /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more elements. */ DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts", - ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_CORE_HYBRID | m_CORE_ATOM + ~(m_ZNVER | m_CORE_HYBRID | m_CORE_ATOM | m_YONGFENG | m_SHIJIDADAO | m_GENERIC | m_GDS)) /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more elements. */ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts", - ~(m_ZNVER4)) + ~(m_ZNVER4 | m_ZNVER5)) /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or smaller FMA chain. */