From patchwork Fri Jul 14 15:14:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1807861 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=bws38Vv2; dkim-atps=neutral Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R2ZmH5xBmz20bh for ; Sat, 15 Jul 2023 01:15:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6291D385842D for ; Fri, 14 Jul 2023 15:15:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6291D385842D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1689347725; bh=kgmKC8+ZSrAm9Vcjneet3ZT01Rm44sVnUhNAG7CzNFI=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=bws38Vv2H19bTsnh2DwT0ByaOrLQnR3E7CcRGNF81izh6ExBKigsnxmA+A7ffoEVH TqhsqNFLPGLu7CXcH7MpuQwb2/12Ns9AET6UT5Q65flxiynxpPmoZQ34dm0uVasfBF 7PR9ortT8DG7A+cJgQOMMXwhbipVvykBnXKpB7NM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by sourceware.org (Postfix) with ESMTPS id 424D33858CD1 for ; Fri, 14 Jul 2023 15:15:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 424D33858CD1 Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-5776312eaddso19283647b3.3 for ; Fri, 14 Jul 2023 08:15:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689347709; x=1691939709; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kgmKC8+ZSrAm9Vcjneet3ZT01Rm44sVnUhNAG7CzNFI=; b=aZjy8gOvCVeiIrVCHa+gyZJpJXfGiPv20dNwrQksRVER2KAcmFYtCRBC6C3ByXUK2y d3cZ8n3vpRRnFdXrLtIv6PnmkTIkwxjWcnkgbL53b/kI5gtZG78A5NYEvhOD6LTleAaK 79Q4SduD4qeLsHDS9o5SYqj9qlBQ3+mCUbL22FldkdcXIJ8INMgeNS9UdCZ+5PodYxmq AbVelhWEHRrt8UPIwM+a1vAy/v/x0/TpUoUgMn3MJjxDSht6auNB5dRGy7rBug256BTp RGt12Sdg6qG5anuPkqv56GaXGK1zNAjgb1X8th503jybCm78ZFXbkWGbQAG0dIxDbxT/ kVbQ== X-Gm-Message-State: ABy/qLZXEmiaKvETM/0FgN+RM1AVpamXeHO4ataEbPxRDPQwoC60yTTb t/f/xDAMZTIY9idbB+9AnpqZyWxE49M= X-Google-Smtp-Source: APBJJlGsI/2cmnxZd3ZClLlqZhTQaHOdALqx6nCkPbb8b+c6TU9FmhDIQQ0upkciP/+bI0Zk9EDJrg== X-Received: by 2002:a81:9112:0:b0:576:e4b7:35ed with SMTP id i18-20020a819112000000b00576e4b735edmr5213879ywg.30.1689347708719; Fri, 14 Jul 2023 08:15:08 -0700 (PDT) Received: from noahgold-DESK.intel.com ([192.55.54.53]) by smtp.gmail.com with ESMTPSA id p187-20020a0de6c4000000b00577335ea38csm2342331ywe.121.2023.07.14.08.15.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 08:15:08 -0700 (PDT) To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v1] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. Date: Fri, 14 Jul 2023 10:14:59 -0500 Message-Id: <20230714151459.3357038-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. --- sysdeps/x86/dl-cacheinfo.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index c98fa57a7b..0436ffb349 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -757,12 +757,21 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) the maximum thrashing capped at 1/associativity. */ unsigned long int non_temporal_threshold = shared / cachesize_non_temporal_divisor; + + /* If the computed non_temporal_threshold <= 3/4 * per-thread L3, we most + likely have incorrect/incomplete cache info in which case, default to + 3/4 * per-thread L3 to avoid regressions. */ + unsigned long int non_temporal_threshold_lowbound + = shared_per_thread * 3 / 4; + if (non_temporal_threshold < non_temporal_threshold_lowbound) + non_temporal_threshold = non_temporal_threshold_lowbound; + /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run a higher risk of actually thrashing the cache as they don't have a HW LRU hint. As well, their performance in highly parallel situations is noticeably worse. */ if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS)) - non_temporal_threshold = shared_per_thread * 3 / 4; + non_temporal_threshold = non_temporal_threshold_lowbound; /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best if that operation cannot overflow. Minimum of 0x4040 (16448) because the