From patchwork Tue Jul 18 15:27:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1809348 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=HMpaI6FO; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R52sR2tt7z20Cs for ; Wed, 19 Jul 2023 01:28:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A6323857359 for ; Tue, 18 Jul 2023 15:28:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A6323857359 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1689694105; bh=Nl1U5caiYKDnFg0dEEdE7/f4b4CVGLnq6o1zOwbTcoY=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=HMpaI6FOIxD+nSU4aWrfjzcIR9dE0v3KBJ1w8BzGB/l/cZfERWSrO/AuU+aMBAPLD 4m+Y2fdnVmB8+9Jq8879GEUCSqNOSkLdohJCqhuqUkX6lHu8Yd37uuGwu9uuX+4A9y bk/s5ZV9VSOWII2OyLsp6VNi/j/M0Q8yTis84uvw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x29.google.com (mail-oa1-x29.google.com [IPv6:2001:4860:4864:20::29]) by sourceware.org (Postfix) with ESMTPS id 38DCF385773C for ; Tue, 18 Jul 2023 15:28:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38DCF385773C Received: by mail-oa1-x29.google.com with SMTP id 586e51a60fabf-1b3c503af99so4065109fac.0 for ; Tue, 18 Jul 2023 08:28:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689694089; x=1692286089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nl1U5caiYKDnFg0dEEdE7/f4b4CVGLnq6o1zOwbTcoY=; b=AbWAhmsQaNHSA3b+PipMvhdZHcenl4t1Lc1MPJH94yzSt/AH8LjMJmDLs1ttvht0qS MS4/f14yNQYgxV7II3JfIKA7QQtkuGmsAXVk0yWjdJywkc3mURYPlPi0FlCRh3+K0P84 vqV0y5Hy4AbxWZuO2F0GpkESHYA1HF+qUDzg5BDiP8dsyVKTVnVJ/LqeIr9SxMeHnRsg GlU7CrMaBIGYGbuSGtlT7fmlct1BUJlHfYg3SY5iIVzOda8uYOfAJ0O6Hd6kkdykg2MB 1hhjDnYkSOuCtqaaw4aZ7RPYStJsDN20UPELRtmWC+yYSgYjRw15GxnUTAYXDeouj0Ql jNqA== X-Gm-Message-State: ABy/qLbnfRv7aqPX8Ojobua6rBBedBixWA9kG3JrgSjcCQqvF3It0n7k H5tSXODyEmPEMwByPIRD9kEGyOr/V8MqMA== X-Google-Smtp-Source: APBJJlGRIva7hZ2HhTLAbq3RZbL1epr3C7X1dwmbi7tFkHTn22Rkpb9lk/OrTR+nKcNqS+zem6VrOg== X-Received: by 2002:a05:6870:9a08:b0:1b3:b987:26a5 with SMTP id fo8-20020a0568709a0800b001b3b98726a5mr16504260oab.4.1689694088880; Tue, 18 Jul 2023 08:28:08 -0700 (PDT) Received: from noahgold-DESK.lan (072-182-045-254.res.spectrum.com. [72.182.45.254]) by smtp.gmail.com with ESMTPSA id g17-20020a9d6a11000000b006b94904baf5sm904031otn.74.2023.07.18.08.28.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 08:28:08 -0700 (PDT) To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v3] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. Date: Tue, 18 Jul 2023 10:27:59 -0500 Message-Id: <20230718152759.219727-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230714151459.3357038-1-goldstein.w.n@gmail.com> References: <20230714151459.3357038-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. --- sysdeps/x86/dl-cacheinfo.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index c98fa57a7b..2586ff0e31 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -745,8 +745,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) /* The default setting for the non_temporal threshold is [1/8, 1/2] of size of the chip's cache (depending on `cachesize_non_temporal_divisor` which - is microarch specific. The default is 1/4). For most Intel and AMD - processors with an initial release date between 2017 and 2023, a thread's + is microarch specific. The default is 1/4). For most Intel processors + with an initial release date between 2017 and 2023, a thread's typical share of the cache is from 18-64MB. Using a reasonable size fraction of L3 is meant to estimate the point where non-temporal stores begin out-competing REP MOVSB. As well the point where the fact that @@ -757,12 +757,21 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) the maximum thrashing capped at 1/associativity. */ unsigned long int non_temporal_threshold = shared / cachesize_non_temporal_divisor; + + /* If the computed non_temporal_threshold <= 3/4 * per-thread L3, we most + likely have incorrect/incomplete cache info in which case, default to + 3/4 * per-thread L3 to avoid regressions. */ + unsigned long int non_temporal_threshold_lowbound + = shared_per_thread * 3 / 4; + if (non_temporal_threshold < non_temporal_threshold_lowbound) + non_temporal_threshold = non_temporal_threshold_lowbound; + /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run a higher risk of actually thrashing the cache as they don't have a HW LRU hint. As well, their performance in highly parallel situations is noticeably worse. */ if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS)) - non_temporal_threshold = shared_per_thread * 3 / 4; + non_temporal_threshold = non_temporal_threshold_lowbound; /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best if that operation cannot overflow. Minimum of 0x4040 (16448) because the