From patchwork Tue Nov 14 08:34:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 1863519 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=xry111.site header.i=@xry111.site header.a=rsa-sha256 header.s=default header.b=ALfolz8P; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SV03H6577z1yRp for ; Tue, 14 Nov 2023 19:34:49 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 01E573840879 for ; Tue, 14 Nov 2023 08:34:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 3632E3841924 for ; Tue, 14 Nov 2023 08:34:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3632E3841924 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3632E3841924 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699950870; cv=none; b=G2cGpHR3+GDRheSYIYJPKPu2AF8KqmSGEfrB/TaAwWembkKiBv9unlLDX+brYZ73IglEbBIXMu98lgQG6Aqld5teSJxWXASVLZrPt2XWmLqyGeyAPRWFWUwmQUxgyAbM+4ixFoM76SzIdT486VUojQew92GFGnpaMjCYuplphdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699950870; c=relaxed/simple; bh=0mDInF9qb7TjFNDd12VomCzUjhRp3cJxoP0glCw6wrM=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=LUpoBKNI2DWJPDgv39dpBKWfgTY1ZVdIGR6rO7AmB53TIPHJkyPspqE3unqM5q81FZZ2YBLFczOicalJaT0xJjwirNL6ozu3dEbzxGlImBjsCMxZKeQOAxe3zMBWO9PiyguUvWRJ1IZ569XBZgR7BaiGImZajFvUSVFNWCiRNKw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1699950861; bh=0mDInF9qb7TjFNDd12VomCzUjhRp3cJxoP0glCw6wrM=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=ALfolz8PDJWrf+O6PXnQcgu1WejA5hoPuCQWAmYmQbmvEWUoM5q+TGMvlReb4Ioc5 M22mcGRRDSisnckuL9s7fvXK+ZyF7V6zWdzKp08bK2dlEIsYy+glqoMOTl12IZ3Dj8 S9fH3sexmnpoNn8lOijFPg6ejE2jlcqPLxjvA1g8= Received: from [IPv6:240e:358:11dd:4400:dc73:854d:832e:2] (unknown [IPv6:240e:358:11dd:4400:dc73:854d:832e:2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 3680C66A03; Tue, 14 Nov 2023 03:34:18 -0500 (EST) Message-ID: <41e81bc766f9cc66cd426d313e4a4c858360822e.camel@xry111.site> Subject: Pushed: [PATCH v2] LoongArch: Use finer-grained DBAR hints From: Xi Ruoyao To: chenglulu , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn Date: Tue, 14 Nov 2023 16:34:13 +0800 In-Reply-To: <01e49973-227f-f7aa-f1f9-25c4392dfb78@loongson.cn> References: <20231113231837.369907-1-xry111@xry111.site> <01e49973-227f-f7aa-f1f9-25c4392dfb78@loongson.cn> Autocrypt: addr=xry111@xry111.site; prefer-encrypt=mutual; keydata=mDMEYnkdPhYJKwYBBAHaRw8BAQdAsY+HvJs3EVKpwIu2gN89cQT/pnrbQtlvd6Yfq7egugi0HlhpIFJ1b3lhbyA8eHJ5MTExQHhyeTExMS5zaXRlPoiTBBMWCgA7FiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQrKrSDhnnEOPHFgD8D9vUToTd1MF5bng9uPJq5y3DfpcxDp+LD3joA3U2TmwA/jZtN9xLH7CGDHeClKZK/ZYELotWfJsqRcthOIGjsdAPuDgEYnkdPhIKKwYBBAGXVQEFAQEHQG+HnNiPZseiBkzYBHwq/nN638o0NPwgYwH70wlKMZhRAwEIB4h4BBgWCgAgFiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwwACgkQrKrSDhnnEOPjXgD/euD64cxwqDIqckUaisT3VCst11RcnO5iRHm6meNIwj0BALLmWplyi7beKrOlqKfuZtCLbiAPywGfCNg8LOTt4iMD User-Agent: Evolution 3.50.1 MIME-Version: 1.0 X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org On Tue, 2023-11-14 at 10:26 +0800, chenglulu wrote: > Hi, >   > >  * Before calling this template, the function get_memmodel is called to process memmodel, which has a piece of code: >   >        /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so >         be conservative and promote consume to acquire. */ >      if (val == MEMMODEL_CONSUME) >        val = MEMMODEL_ACQUIRE; > >  * So I think MEMMODEL_CONSUME don't need to be processed here either. >   > > Otherwise is OK. Thanks, I've removed case MEMMODEL_CONSUME and there seems no issue. RISC-V mem_thread_fence expansion also does not handle MEMMODEL_CONSUME. Pushed r14-5432 with case MEMMODEL_CONSUME removed and comment adjusted, as attached. But curiously there are various references to MEMMODEL_CONSUME in gcc/config: $ grep -lr MEMMODEL_CONSUME gcc/config gcc/config/aarch64/aarch64.cc gcc/config/riscv/riscv.cc gcc/config/ia64/ia64.cc gcc/config/ia64/sync.md gcc/config/gcn/gcn.md gcc/config/loongarch/loongarch.cc gcc/config/rs6000/rs6000.cc gcc/config/rs6000/sync.md gcc/config/nvptx/nvptx.cc Maybe all of them are redundant? From 4a70bfbf686c2b6a1ecd83fe851de826c612c3e0 Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Tue, 14 Nov 2023 05:32:38 +0800 Subject: [PATCH v2] LoongArch: Use finer-grained DBAR hints LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]: - Bit 4: kind of constraint (0: completion, 1: ordering) - Bit 3: barrier for previous read (0: true, 1: false) - Bit 2: barrier for previous write (0: true, 1: false) - Bit 1: barrier for succeeding read (0: true, 1: false) - Bit 0: barrier for succeeding write (0: true, 1: false) LLVM has already utilized them for different memory orders [3]: - Bit 4 is always set to one because it's only intended to be zero for things like MMIO devices, which are out of the scope of memory orders. - An acquire barrier is used to implement acquire loads like ld.d $a1, $t0, 0 dbar acquire_hint where the load operation (ld.d) should not be reordered with any load or store operation after the acquire load. To accomplish this constraint, we need to prevent the load operation from being reordered after the barrier, and also prevent any following load/store operation from being reordered before the barrier. Thus bits 0, 1, and 3 must be zero, and bit 2 can be one, so acquire_hint should be 0b10100. - An release barrier is used to implement release stores like dbar release_hint st.d $a1, $t0, 0 where the store operation (st.d) should not be reordered with any load or store operation before the release store. So we need to prevent the store operation from being reordered before the barrier, and also prevent any preceding load/store operation from being reordered after the barrier. So bits 0, 2, 3 must be zero, and bit 1 can be one. So release_hint should be 0b10010. A similar mapping has been utilized for RISC-V GCC [4], LoongArch Linux kernel [1], and LoongArch LLVM [3]. So the mapping should be correct. And I've also bootstrapped & regtested GCC on a LA664 with this patch. The LoongArch CPUs should treat "unknown" hints as dbar 0, so we can unconditionally emit the new hints without a compiler switch. [1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed [2]: https://github.com/loongson-community/docs/pull/12 [3]: https://github.com/llvm/llvm-project/pull/68787 [4]: https://gcc.gnu.org/r14-406 gcc/ChangeLog: * config/loongarch/sync.md (mem_thread_fence): Remove redundant check. (mem_thread_fence_1): Emit finer-grained DBAR hints for different memory models, instead of 0. --- gcc/config/loongarch/sync.md | 51 +++++++++++++++++++++++++++++------- 1 file changed, 42 insertions(+), 9 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 9924d522bcd..1ad0c63e0d9 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -50,23 +50,56 @@ (define_expand "mem_thread_fence" [(match_operand:SI 0 "const_int_operand" "")] ;; model "" { - if (INTVAL (operands[0]) != MEMMODEL_RELAXED) - { - rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); - MEM_VOLATILE_P (mem) = 1; - emit_insn (gen_mem_thread_fence_1 (mem, operands[0])); - } + rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (mem) = 1; + emit_insn (gen_mem_thread_fence_1 (mem, operands[0])); + DONE; }) -;; Until the LoongArch memory model (hence its mapping from C++) is finalized, -;; conservatively emit a full FENCE. +;; DBAR hint encoding for LA664 and later micro-architectures, paraphrased from +;; the Linux patch revealing it [1]: +;; +;; - Bit 4: kind of constraint (0: completion, 1: ordering) +;; - Bit 3: barrier for previous read (0: true, 1: false) +;; - Bit 2: barrier for previous write (0: true, 1: false) +;; - Bit 1: barrier for succeeding read (0: true, 1: false) +;; - Bit 0: barrier for succeeding write (0: true, 1: false) +;; +;; [1]: https://git.kernel.org/torvalds/c/e031a5f3f1ed +;; +;; Implementations without support for the finer-granularity hints simply treat +;; all as the full barrier (DBAR 0), so we can unconditionally start emiting the +;; more precise hints right away. (define_insn "mem_thread_fence_1" [(set (match_operand:BLK 0 "" "") (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER)) (match_operand:SI 1 "const_int_operand" "")] ;; model "" - "dbar\t0") + { + enum memmodel model = memmodel_base (INTVAL (operands[1])); + + switch (model) + { + case MEMMODEL_ACQUIRE: + return "dbar\t0b10100"; + case MEMMODEL_RELEASE: + return "dbar\t0b10010"; + case MEMMODEL_ACQ_REL: + case MEMMODEL_SEQ_CST: + return "dbar\t0b10000"; + default: + /* GCC internal: "For the '__ATOMIC_RELAXED' model no instructions + need to be issued and this expansion is not invoked." + + __atomic builtins doc: "Consume is implemented using the + stronger acquire memory order because of a deficiency in C++11's + semantics." See PR 59448 and get_memmodel in builtins.cc. + + Other values should not be returned by memmodel_base. */ + gcc_unreachable (); + } + }) ;; Atomic memory operations. -- 2.42.1