From patchwork Wed May 10 22:12:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1779703 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=hXkUnkDB; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QGq5l1stxz213w for ; Thu, 11 May 2023 08:12:43 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 26D7B385702B for ; Wed, 10 May 2023 22:12:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 26D7B385702B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1683756761; bh=XxNC5ARX8u5mR3bYxnAOxJxnIyZupnVfbVQKqJWKXec=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=hXkUnkDBHKg+Z/XkSwJGGod4sixrUENCT/2c8LZ/apJrwQwaX7TJyN+9hbPrNm6uK 07dc/86GSzkUqdSOsJp7UKp874PL4GWv/Sk5VYP/RIydc8yg6HXOAEdrL3Y1xL3EzO egDtBxx1TufG6WqmbxlNGhaDmTqRHgbNJUFQqmtE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 70F4038582AB for ; Wed, 10 May 2023 22:12:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 70F4038582AB Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-50c8d87c775so10355451a12.3 for ; Wed, 10 May 2023 15:12:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683756740; x=1686348740; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XxNC5ARX8u5mR3bYxnAOxJxnIyZupnVfbVQKqJWKXec=; b=ckkfVP26r/R+Uuw5aL8gXz0RoO5j98O+eDnnegtqQjEci0W48t+xEPy3nlC07sOwwb zepqv0jISChg2nIwwMEIhelduEzyVEg6E73xTWCOib5ruS31AfFCi1BRc8GT2Mu1UE5f nYSVsfCb1DX4OTArSJ6awO3MludR3fVxSYqzlK82X46efTlq9sITQaCOOAOC0h5EwK8S 6EwvvG/n/98isQ0EM1ftCqEuey+tXt4vzmmSJR/FIPCYsQl1ZCB+A5hhosUF1fidd8de OHNioDmQbytWBSB1JZ3RbwlE6OkqY0oxiA0AqDtInTQkD96vzt4O7Mh4rxTKvAFB6wzj 5ojw== X-Gm-Message-State: AC+VfDy19Pn3IlfaeDHv6RHsIep+H/BanAdtDlnizXZK9Rhr/J5i2mjz fDqGbyV2VfpYPRJuDaihu9tXaJHboCE= X-Google-Smtp-Source: ACHHUZ4ic1h+uxzkEQqVxK8lcMSkRPZBq8U4QMF0H0xQxktfuGAYsG2PWDIs2DZDAhQXbWuHAdGMpw== X-Received: by 2002:a17:906:6a25:b0:966:6316:683e with SMTP id qw37-20020a1709066a2500b009666316683emr12815405ejc.5.1683756739738; Wed, 10 May 2023 15:12:19 -0700 (PDT) Received: from noahgold-desk.lan (2603-8080-1301-76c6-d514-288c-2603-6ec7.res6.spectrum.com. [2603:8080:1301:76c6:d514:288c:2603:6ec7]) by smtp.gmail.com with ESMTPSA id mc27-20020a170906eb5b00b00966330021e9sm3141378ejb.47.2023.05.10.15.12.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 May 2023 15:12:18 -0700 (PDT) To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v7 2/4] x86: Refactor Intel `init_cpu_features` Date: Wed, 10 May 2023 17:12:11 -0500 Message-Id: <20230510221213.765754-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230424050329.1501348-1-goldstein.w.n@gmail.com>v7-0001-x86-Increase-non_temporal_threshold-to-roughly-si.patch> References: <20230424050329.1501348-1-goldstein.w.n@gmail.com>v7-0001-x86-Increase-non_temporal_threshold-to-roughly-si.patch> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This patch should have no affect on existing functionality. The current code, which has a single switch for model detection and setting prefered features, is difficult to follow/extend. The cases use magic numbers and many microarchitectures are missing. This makes it difficult to reason about what is implemented so far and/or how/where to add support for new features. This patch splits the model detection and preference setting stages so that CPU preferences can be set based on a complete list of available microarchitectures, rather than based on model magic numbers. --- sysdeps/x86/cpu-features.c | 401 +++++++++++++++++++++++++++++-------- 1 file changed, 317 insertions(+), 84 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 5bff8ec0b4..9d433f8144 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -417,6 +417,217 @@ _Static_assert (((index_arch_Fast_Unaligned_Load == index_arch_Fast_Copy_Backward)), "Incorrect index_arch_Fast_Unaligned_Load"); + +/* Intel Family-6 microarch list. */ +enum +{ + /* Atom processors. */ + INTEL_ATOM_BONNELL, + INTEL_ATOM_SALTWELL, + INTEL_ATOM_SILVERMONT, + INTEL_ATOM_AIRMONT, + INTEL_ATOM_GOLDMONT, + INTEL_ATOM_GOLDMONT_PLUS, + INTEL_ATOM_SIERRAFOREST, + INTEL_ATOM_GRANDRIDGE, + INTEL_ATOM_TREMONT, + + /* Bigcore processors. */ + INTEL_BIGCORE_MEROM, + INTEL_BIGCORE_PENRYN, + INTEL_BIGCORE_DUNNINGTON, + INTEL_BIGCORE_NEHALEM, + INTEL_BIGCORE_WESTMERE, + INTEL_BIGCORE_SANDYBRIDGE, + INTEL_BIGCORE_IVYBRIDGE, + INTEL_BIGCORE_HASWELL, + INTEL_BIGCORE_BROADWELL, + INTEL_BIGCORE_SKYLAKE, + INTEL_BIGCORE_AMBERLAKE, + INTEL_BIGCORE_COFFEELAKE, + INTEL_BIGCORE_WHISKEYLAKE, + INTEL_BIGCORE_KABYLAKE, + INTEL_BIGCORE_COMETLAKE, + INTEL_BIGCORE_SKYLAKE_AVX512, + INTEL_BIGCORE_CANNONLAKE, + INTEL_BIGCORE_CASCADELAKE, + INTEL_BIGCORE_COOPERLAKE, + INTEL_BIGCORE_ICELAKE, + INTEL_BIGCORE_TIGERLAKE, + INTEL_BIGCORE_ROCKETLAKE, + INTEL_BIGCORE_SAPPHIRERAPIDS, + INTEL_BIGCORE_RAPTORLAKE, + INTEL_BIGCORE_EMERALDRAPIDS, + INTEL_BIGCORE_METEORLAKE, + INTEL_BIGCORE_LUNARLAKE, + INTEL_BIGCORE_ARROWLAKE, + INTEL_BIGCORE_GRANITERAPIDS, + + /* Mixed (bigcore + atom SOC). */ + INTEL_MIXED_LAKEFIELD, + INTEL_MIXED_ALDERLAKE, + + /* KNL. */ + INTEL_KNIGHTS_MILL, + INTEL_KNIGHTS_LANDING, + + /* Unknown. */ + INTEL_UNKNOWN, +}; + +static unsigned int +intel_get_fam6_microarch (unsigned int model, unsigned int stepping) +{ + switch (model) + { + case 0x1C: + case 0x26: + return INTEL_ATOM_BONNELL; + case 0x27: + case 0x35: + case 0x36: + return INTEL_ATOM_SALTWELL; + case 0x37: + case 0x4A: + case 0x4D: + case 0x5D: + return INTEL_ATOM_SILVERMONT; + case 0x4C: + case 0x5A: + case 0x75: + return INTEL_ATOM_AIRMONT; + case 0x5C: + case 0x5F: + return INTEL_ATOM_GOLDMONT; + case 0x7A: + return INTEL_ATOM_GOLDMONT_PLUS; + case 0xAF: + return INTEL_ATOM_SIERRAFOREST; + case 0xB6: + return INTEL_ATOM_GRANDRIDGE; + case 0x86: + case 0x96: + case 0x9C: + return INTEL_ATOM_TREMONT; + case 0x0F: + case 0x16: + return INTEL_BIGCORE_MEROM; + case 0x17: + return INTEL_BIGCORE_PENRYN; + case 0x1D: + return INTEL_BIGCORE_DUNNINGTON; + case 0x1A: + case 0x1E: + case 0x1F: + case 0x2E: + return INTEL_BIGCORE_NEHALEM; + case 0x25: + case 0x2C: + case 0x2F: + return INTEL_BIGCORE_WESTMERE; + case 0x2A: + case 0x2D: + return INTEL_BIGCORE_SANDYBRIDGE; + case 0x3A: + case 0x3E: + return INTEL_BIGCORE_IVYBRIDGE; + case 0x3C: + case 0x3F: + case 0x45: + case 0x46: + return INTEL_BIGCORE_HASWELL; + case 0x3D: + case 0x47: + case 0x4F: + case 0x56: + return INTEL_BIGCORE_BROADWELL; + case 0x4E: + case 0x5E: + return INTEL_BIGCORE_SKYLAKE; + case 0x8E: + switch (stepping) + { + case 0x09: + return INTEL_BIGCORE_AMBERLAKE; + case 0x0A: + return INTEL_BIGCORE_COFFEELAKE; + case 0x0B: + case 0x0C: + return INTEL_BIGCORE_WHISKEYLAKE; + default: + return INTEL_BIGCORE_KABYLAKE; + } + case 0x9E: + switch (stepping) + { + case 0x0A: + case 0x0B: + case 0x0C: + case 0x0D: + return INTEL_BIGCORE_COFFEELAKE; + default: + return INTEL_BIGCORE_KABYLAKE; + } + case 0xA5: + case 0xA6: + return INTEL_BIGCORE_COMETLAKE; + case 0x66: + return INTEL_BIGCORE_CANNONLAKE; + case 0x55: + switch (stepping) + { + case 0x06: + case 0x07: + return INTEL_BIGCORE_CASCADELAKE; + case 0x0b: + return INTEL_BIGCORE_COOPERLAKE; + default: + return INTEL_BIGCORE_SKYLAKE_AVX512; + } + case 0x6A: + case 0x6C: + case 0x7D: + case 0x7E: + case 0x9D: + return INTEL_BIGCORE_ICELAKE; + case 0x8C: + case 0x8D: + return INTEL_BIGCORE_TIGERLAKE; + case 0xA7: + return INTEL_BIGCORE_ROCKETLAKE; + case 0x8F: + return INTEL_BIGCORE_SAPPHIRERAPIDS; + case 0xB7: + case 0xBA: + case 0xBF: + return INTEL_BIGCORE_RAPTORLAKE; + case 0xCF: + return INTEL_BIGCORE_EMERALDRAPIDS; + case 0xAA: + case 0xAC: + return INTEL_BIGCORE_METEORLAKE; + case 0xbd: + return INTEL_BIGCORE_LUNARLAKE; + case 0xc6: + return INTEL_BIGCORE_ARROWLAKE; + case 0xAD: + case 0xAE: + return INTEL_BIGCORE_GRANITERAPIDS; + case 0x8A: + return INTEL_MIXED_LAKEFIELD; + case 0x97: + case 0x9A: + case 0xBE: + return INTEL_MIXED_ALDERLAKE; + case 0x85: + return INTEL_KNIGHTS_MILL; + case 0x57: + return INTEL_KNIGHTS_LANDING; + default: + return INTEL_UNKNOWN; + } +} + static inline void init_cpu_features (struct cpu_features *cpu_features) { @@ -453,129 +664,151 @@ init_cpu_features (struct cpu_features *cpu_features) if (family == 0x06) { model += extended_model; - switch (model) + unsigned int microarch + = intel_get_fam6_microarch (model, stepping); + + switch (microarch) { - case 0x1c: - case 0x26: - /* BSF is slow on Atom. */ + /* Atom / KNL tuning. */ + case INTEL_ATOM_BONNELL: + /* BSF is slow on Bonnell. */ cpu_features->preferred[index_arch_Slow_BSF] - |= bit_arch_Slow_BSF; + |= bit_arch_Slow_BSF; break; - case 0x57: - /* Knights Landing. Enable Silvermont optimizations. */ - - case 0x7a: - /* Unaligned load versions are faster than SSSE3 - on Goldmont Plus. */ - - case 0x5c: - case 0x5f: /* Unaligned load versions are faster than SSSE3 - on Goldmont. */ + on Airmont, Silvermont, Goldmont, and Goldmont Plus. */ + case INTEL_ATOM_AIRMONT: + case INTEL_ATOM_SILVERMONT: + case INTEL_ATOM_GOLDMONT: + case INTEL_ATOM_GOLDMONT_PLUS: - case 0x4c: - case 0x5a: - case 0x75: - /* Airmont is a die shrink of Silvermont. */ + /* Knights Landing. Enable Silvermont optimizations. */ + case INTEL_KNIGHTS_LANDING: - case 0x37: - case 0x4a: - case 0x4d: - case 0x5d: - /* Unaligned load versions are faster than SSSE3 - on Silvermont. */ cpu_features->preferred[index_arch_Fast_Unaligned_Load] - |= (bit_arch_Fast_Unaligned_Load - | bit_arch_Fast_Unaligned_Copy - | bit_arch_Prefer_PMINUB_for_stringop - | bit_arch_Slow_SSE4_2); + |= (bit_arch_Fast_Unaligned_Load + | bit_arch_Fast_Unaligned_Copy + | bit_arch_Prefer_PMINUB_for_stringop + | bit_arch_Slow_SSE4_2); break; - case 0x86: - case 0x96: - case 0x9c: + case INTEL_ATOM_TREMONT: /* Enable rep string instructions, unaligned load, unaligned - copy, pminub and avoid SSE 4.2 on Tremont. */ + copy, pminub and avoid SSE 4.2 on Tremont. */ cpu_features->preferred[index_arch_Fast_Rep_String] - |= (bit_arch_Fast_Rep_String - | bit_arch_Fast_Unaligned_Load - | bit_arch_Fast_Unaligned_Copy - | bit_arch_Prefer_PMINUB_for_stringop - | bit_arch_Slow_SSE4_2); + |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load + | bit_arch_Fast_Unaligned_Copy + | bit_arch_Prefer_PMINUB_for_stringop + | bit_arch_Slow_SSE4_2); break; + /* Default tuned KNL microarch. */ + case INTEL_KNIGHTS_MILL: + goto default_tuning; + /* Default tuned atom microarch. */ + case INTEL_ATOM_SIERRAFOREST: + case INTEL_ATOM_GRANDRIDGE: + case INTEL_ATOM_SALTWELL: + goto default_tuning; + + /* Bigcore Tuning. */ + case INTEL_UNKNOWN: default: + default_tuning: /* Unknown family 0x06 processors. Assuming this is one of Core i3/i5/i7 processors if AVX is available. */ if (!CPU_FEATURES_CPU_P (cpu_features, AVX)) break; - /* Fall through. */ - - case 0x1a: - case 0x1e: - case 0x1f: - case 0x25: - case 0x2c: - case 0x2e: - case 0x2f: + case INTEL_BIGCORE_NEHALEM: + case INTEL_BIGCORE_WESTMERE: /* Rep string instructions, unaligned load, unaligned copy, and pminub are fast on Intel Core i3, i5 and i7. */ cpu_features->preferred[index_arch_Fast_Rep_String] - |= (bit_arch_Fast_Rep_String - | bit_arch_Fast_Unaligned_Load - | bit_arch_Fast_Unaligned_Copy - | bit_arch_Prefer_PMINUB_for_stringop); + |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load + | bit_arch_Fast_Unaligned_Copy + | bit_arch_Prefer_PMINUB_for_stringop); break; + + /* Default tuned Bigcore microarch. */ + case INTEL_BIGCORE_SANDYBRIDGE: + case INTEL_BIGCORE_IVYBRIDGE: + case INTEL_BIGCORE_HASWELL: + case INTEL_BIGCORE_BROADWELL: + case INTEL_BIGCORE_SKYLAKE: + case INTEL_BIGCORE_AMBERLAKE: + case INTEL_BIGCORE_COFFEELAKE: + case INTEL_BIGCORE_WHISKEYLAKE: + case INTEL_BIGCORE_KABYLAKE: + case INTEL_BIGCORE_COMETLAKE: + case INTEL_BIGCORE_SKYLAKE_AVX512: + case INTEL_BIGCORE_CASCADELAKE: + case INTEL_BIGCORE_COOPERLAKE: + case INTEL_BIGCORE_CANNONLAKE: + case INTEL_BIGCORE_ICELAKE: + case INTEL_BIGCORE_TIGERLAKE: + case INTEL_BIGCORE_ROCKETLAKE: + case INTEL_BIGCORE_RAPTORLAKE: + case INTEL_BIGCORE_METEORLAKE: + case INTEL_BIGCORE_LUNARLAKE: + case INTEL_BIGCORE_ARROWLAKE: + case INTEL_BIGCORE_SAPPHIRERAPIDS: + case INTEL_BIGCORE_EMERALDRAPIDS: + case INTEL_BIGCORE_GRANITERAPIDS: + goto default_tuning; + + /* Default tuned Mixed (bigcore + atom SOC). */ + case INTEL_MIXED_LAKEFIELD: + case INTEL_MIXED_ALDERLAKE: + goto default_tuning; } - /* Disable TSX on some processors to avoid TSX on kernels that - weren't updated with the latest microcode package (which - disables broken feature by default). */ - switch (model) + /* Disable TSX on some processors to avoid TSX on kernels that + weren't updated with the latest microcode package (which + disables broken feature by default). */ + switch (microarch) { - case 0x55: + case INTEL_BIGCORE_SKYLAKE_AVX512: + /* 0x55 && stepping <= 5 is SKYLAKE_AVX512. Cascadelake and + Cooperlake also have model 0x55 but stepping 5/6 and 11 + respectively so double check the stepping to be safe. */ if (stepping <= 5) goto disable_tsx; break; - case 0x8e: - /* NB: Although the errata documents that for model == 0x8e, - only 0xb stepping or lower are impacted, the intention of - the errata was to disable TSX on all client processors on - all steppings. Include 0xc stepping which is an Intel - Core i7-8665U, a client mobile processor. */ - case 0x9e: - if (stepping > 0xc) + + case INTEL_BIGCORE_SKYLAKE: + case INTEL_BIGCORE_AMBERLAKE: + case INTEL_BIGCORE_COFFEELAKE: + case INTEL_BIGCORE_WHISKEYLAKE: + case INTEL_BIGCORE_KABYLAKE: + /* NB: Although the errata documents that for model == 0x8e + (skylake client), only 0xb stepping or lower are impacted, + the intention of the errata was to disable TSX on all client + processors on all steppings. Include 0xc stepping which is + an Intel Core i7-8665U, a client mobile processor. */ + if ((model == 0x8e || model == 0x9e) && stepping > 0xc) break; - /* Fall through. */ - case 0x4e: - case 0x5e: - { + /* Disable Intel TSX and enable RTM_ALWAYS_ABORT for processors listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html */ -disable_tsx: + disable_tsx: CPU_FEATURE_UNSET (cpu_features, HLE); CPU_FEATURE_UNSET (cpu_features, RTM); CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT); - } - break; - case 0x3f: - /* Xeon E7 v3 with stepping >= 4 has working TSX. */ - if (stepping >= 4) break; - /* Fall through. */ - case 0x3c: - case 0x45: - case 0x46: - /* Disable Intel TSX on Haswell processors (except Xeon E7 v3 - with stepping >= 4) to avoid TSX on kernels that weren't - updated with the latest microcode package (which disables - broken feature by default). */ - CPU_FEATURE_UNSET (cpu_features, RTM); - break; + + case INTEL_BIGCORE_HASWELL: + /* Xeon E7 v3 (model == 0x3f) with stepping >= 4 has working + TSX. Haswell also include other model numbers that have + working TSX. */ + if (model == 0x3f && stepping >= 4) + break; + + CPU_FEATURE_UNSET (cpu_features, RTM); + break; } } From patchwork Wed May 10 22:12:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1779704 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=veXwR2mt; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QGq622Zc6z213w for ; Thu, 11 May 2023 08:12:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 550E63856249 for ; Wed, 10 May 2023 22:12:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 550E63856249 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1683756776; bh=0JB530HnaG2OZ2huNedSxSI1hsjYpY+O5SWkAyi1ueA=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=veXwR2mtazF8OkT4Gon8ruMkoHYs/PmzH2vt/olK17wpaYLVMX3XJO0+wZv0vaupi lW8ggxn6rZhGy6fKAbITfnRXHs9RgbTFg5b+csHRhptRWjufhrGT1p6veuYdn8Xi+t k1Fhmcm2MH53C+XemJcoAYxPokO99r98N4KBBiWw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 30A6E3857036 for ; Wed, 10 May 2023 22:12:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 30A6E3857036 Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-965d2749e2eso1114453466b.1 for ; Wed, 10 May 2023 15:12:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683756757; x=1686348757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0JB530HnaG2OZ2huNedSxSI1hsjYpY+O5SWkAyi1ueA=; b=Dwlmq4AZFnrclwR1cpr55WbI3wq5n+VCsVj5n4Ab8D2vETBBon51aQK+Qb3eM5+SBe pZZy6sKaON58sDpMpfjFNxk5/wwLtM/j/rIwR74YUuIaxWKBvj6PB+vetmEwqehaO5Wc zDHnK8HXqitjIxz76bBF2oN5cM/Za/5hceuDZN4YR2h87a52/V4fENfdGpW+wD/lRHLq fdSvklJAPGENOvulquRLuFi8P9W4BkTPNnKS7yJDpygRfudsJs9lDbE72m934+N36MjM ZVVb6jNxf2L2RHqno3xyKJ2C3fUaUXnzs4IitEMqIYo8tdjDDVKfH8j8iYl78XHOj0zj bmGQ== X-Gm-Message-State: AC+VfDwpzwAkFyijNdr0V/s6vaVXuln5VMaYNMO5ZDAWSLdkXLurtYvd SSOevYGO14NCXF/JMB+F+kIxDN2x8bw= X-Google-Smtp-Source: ACHHUZ6pSbWrNxendHQocIQ521FbGM5RNR+8o1EXnedWGgV27tZRTwah/V92D88owBWrVPxxVEiItw== X-Received: by 2002:a17:907:7fa2:b0:932:4255:5902 with SMTP id qk34-20020a1709077fa200b0093242555902mr18879351ejc.76.1683756757059; Wed, 10 May 2023 15:12:37 -0700 (PDT) Received: from noahgold-desk.lan (2603-8080-1301-76c6-d514-288c-2603-6ec7.res6.spectrum.com. [2603:8080:1301:76c6:d514:288c:2603:6ec7]) by smtp.gmail.com with ESMTPSA id mc27-20020a170906eb5b00b00966330021e9sm3141378ejb.47.2023.05.10.15.12.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 May 2023 15:12:36 -0700 (PDT) To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v7 3/4] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Date: Wed, 10 May 2023 17:12:12 -0500 Message-Id: <20230510221213.765754-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230510221213.765754-1-goldstein.w.n@gmail.com> References: <20230424050329.1501348-1-goldstein.w.n@gmail.com>v7-0001-x86-Increase-non_temporal_threshold-to-roughly-si.patch> <20230510221213.765754-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Different systems prefer a different divisors. From benchmarks[1] so far the following divisors have been found: ICX : 2 SKX : 2 BWD : 8 For Intel, we are generalizing that BWD and older prefers 8 as a divisor, and SKL and newer prefers 2. This number can be further tuned as benchmarks are run. [1]: https://github.com/goldsteinn/memcpy-nt-benchmarks --- sysdeps/x86/cpu-features.c | 16 +++++++++++++-- sysdeps/x86/dl-cacheinfo.h | 32 ++++++++++++++++++------------ sysdeps/x86/include/cpu-features.h | 3 +++ 3 files changed, 36 insertions(+), 15 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 9d433f8144..4cc1cd9fed 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -637,6 +637,7 @@ init_cpu_features (struct cpu_features *cpu_features) unsigned int stepping = 0; enum cpu_features_kind kind; + cpu_features->cachesize_non_temporal_divisor = 4; #if !HAS_CPUID if (__get_cpuid_max (0, 0) == 0) { @@ -720,8 +721,8 @@ init_cpu_features (struct cpu_features *cpu_features) of Core i3/i5/i7 processors if AVX is available. */ if (!CPU_FEATURES_CPU_P (cpu_features, AVX)) break; - case INTEL_BIGCORE_NEHALEM: - case INTEL_BIGCORE_WESTMERE: + + enable_modern_features: /* Rep string instructions, unaligned load, unaligned copy, and pminub are fast on Intel Core i3, i5 and i7. */ cpu_features->preferred[index_arch_Fast_Rep_String] @@ -730,11 +731,20 @@ init_cpu_features (struct cpu_features *cpu_features) | bit_arch_Prefer_PMINUB_for_stringop); break; + case INTEL_BIGCORE_NEHALEM: + case INTEL_BIGCORE_WESTMERE: + /* Older CPUs prefer non-temporal stores at lower threshold. */ + cpu_features->cachesize_non_temporal_divisor = 8; + goto enable_modern_features; + /* Default tuned Bigcore microarch. */ case INTEL_BIGCORE_SANDYBRIDGE: case INTEL_BIGCORE_IVYBRIDGE: case INTEL_BIGCORE_HASWELL: case INTEL_BIGCORE_BROADWELL: + cpu_features->cachesize_non_temporal_divisor = 8; + goto default_tuning; + case INTEL_BIGCORE_SKYLAKE: case INTEL_BIGCORE_AMBERLAKE: case INTEL_BIGCORE_COFFEELAKE: @@ -755,11 +765,13 @@ init_cpu_features (struct cpu_features *cpu_features) case INTEL_BIGCORE_SAPPHIRERAPIDS: case INTEL_BIGCORE_EMERALDRAPIDS: case INTEL_BIGCORE_GRANITERAPIDS: + cpu_features->cachesize_non_temporal_divisor = 2; goto default_tuning; /* Default tuned Mixed (bigcore + atom SOC). */ case INTEL_MIXED_LAKEFIELD: case INTEL_MIXED_ALDERLAKE: + cpu_features->cachesize_non_temporal_divisor = 2; goto default_tuning; } diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 4a1a5423ff..864b00a521 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -738,19 +738,25 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) cpu_features->level3_cache_linesize = level3_cache_linesize; cpu_features->level4_cache_size = level4_cache_size; - /* The default setting for the non_temporal threshold is 1/4 of size - of the chip's cache. For most Intel and AMD processors with an - initial release date between 2017 and 2023, a thread's typical - share of the cache is from 18-64MB. Using the 1/4 L3 is meant to - estimate the point where non-temporal stores begin outcompeting - REP MOVSB. As well the point where the fact that non-temporal - stores are forced back to main memory would already occurred to the - majority of the lines in the copy. Note, concerns about the - entire L3 cache being evicted by the copy are mostly alleviated - by the fact that modern HW detects streaming patterns and - provides proper LRU hints so that the maximum thrashing - capped at 1/associativity. */ - unsigned long int non_temporal_threshold = shared / 4; + unsigned long int cachesize_non_temporal_divisor + = cpu_features->cachesize_non_temporal_divisor; + if (cachesize_non_temporal_divisor <= 0) + cachesize_non_temporal_divisor = 4; + + /* The default setting for the non_temporal threshold is [1/2, 1/8] of size + of the chip's cache (depending on `cachesize_non_temporal_divisor` which + is microarch specific. The defeault is 1/4). For most Intel and AMD + processors with an initial release date between 2017 and 2023, a thread's + typical share of the cache is from 18-64MB. Using a reasonable size + fraction of L3 is meant to estimate the point where non-temporal stores + begin outcompeting REP MOVSB. As well the point where the fact that + non-temporal stores are forced back to main memory would already occurred + to the majority of the lines in the copy. Note, concerns about the entire + L3 cache being evicted by the copy are mostly alleviated by the fact that + modern HW detects streaming patterns and provides proper LRU hints so that + the maximum thrashing capped at 1/associativity. */ + unsigned long int non_temporal_threshold + = shared / cachesize_non_temporal_divisor; /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run a higher risk of actually thrashing the cache as they don't have a HW LRU hint. As well, there performance in highly parallel situations is diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h index 40b8129d6a..f5b9dd54fe 100644 --- a/sysdeps/x86/include/cpu-features.h +++ b/sysdeps/x86/include/cpu-features.h @@ -915,6 +915,9 @@ struct cpu_features unsigned long int shared_cache_size; /* Threshold to use non temporal store. */ unsigned long int non_temporal_threshold; + /* When no user non_temporal_threshold is specified. We default to + cachesize / cachesize_non_temporal_divisor. */ + unsigned long int cachesize_non_temporal_divisor; /* Threshold to use "rep movsb". */ unsigned long int rep_movsb_threshold; /* Threshold to stop using "rep movsb". */ From patchwork Wed May 10 22:12:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1779705 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=cFGZwSL8; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QGq6Z6KT1z213w for ; Thu, 11 May 2023 08:13:26 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A5623855591 for ; Wed, 10 May 2023 22:13:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4A5623855591 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1683756804; bh=qnTJ2mT+gxcIuB98btex6OlvzSnmUGx3MQnC472k8G4=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=cFGZwSL8EBBkPh80zvUFcIwI7GmZK9Eg96RWoBFoE6wpe+AVVHx6pHBeb8Y8lQyCg 8Z6kulRvNwtCsm2q3OzHaC8HClE+wbev/13rp5yNXP7MpYpzCcdenoQ3WlNDjSgKe1 zaIB/zhihpOLsdgYdr0+MdLZWMQuKBEArzqO5yg4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id D2AD638582AB for ; Wed, 10 May 2023 22:12:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D2AD638582AB Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-50bd37ca954so73456659a12.0 for ; Wed, 10 May 2023 15:12:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683756760; x=1686348760; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qnTJ2mT+gxcIuB98btex6OlvzSnmUGx3MQnC472k8G4=; b=lr5SzrYb/7MgQkdgqcoLr+Vg/7ZBMdpMwFBg/ZgSk+LTf9mWIsYGdP3UBttByAh8FR JQmOmRK7N6hlp45OyZ3trrHzAxvoTDy6v2my7Z2YWqTzMEEYsEiInflIusPnnxau06QK o1aRxrDQS9ryAkIin/NfRzXaha2Km91mL3kCUQK/HLgPfuu/1SxsyUzcyOrg4Hz6ppm1 Hu+dhUfvJANW8Xwjb4a1U0cPlpSj71Kf8uVvCz35sMaZzsdyyHoVASji6CG9JJFotHfF TtJ3EZE/ow9U2TJKZ4CKPsaLZCmPhKTDddrEpaN9q86wCOprKNIhGTpwmco+sZ7Ze4gP raQg== X-Gm-Message-State: AC+VfDzDc8F3DPgEZlJP3fl+d7lhzq3arOWGUJkJEuQKAiOMlZw4q+W3 9b6sGPsvR8TyVX0CDa5U0Z8RdHE4phE= X-Google-Smtp-Source: ACHHUZ7RBy4xznlssotgy71AqaleuqjCj8GNEdewp5yx5HmFhcv920g9vKeSZEiq6boIFnrB2BMq8w== X-Received: by 2002:a17:907:1622:b0:965:aa65:233f with SMTP id hb34-20020a170907162200b00965aa65233fmr17696979ejc.2.1683756759576; Wed, 10 May 2023 15:12:39 -0700 (PDT) Received: from noahgold-desk.lan (2603-8080-1301-76c6-d514-288c-2603-6ec7.res6.spectrum.com. [2603:8080:1301:76c6:d514:288c:2603:6ec7]) by smtp.gmail.com with ESMTPSA id mc27-20020a170906eb5b00b00966330021e9sm3141378ejb.47.2023.05.10.15.12.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 May 2023 15:12:38 -0700 (PDT) To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v7 4/4] x86: Tune 'Saltwell' microarch the same was a 'Bonnell' Date: Wed, 10 May 2023 17:12:13 -0500 Message-Id: <20230510221213.765754-3-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230510221213.765754-1-goldstein.w.n@gmail.com> References: <20230424050329.1501348-1-goldstein.w.n@gmail.com>v7-0001-x86-Increase-non_temporal_threshold-to-roughly-si.patch> <20230510221213.765754-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Saltwell is just a die shrink of Bonnell, so the same micro-architectural optimization preferences apply. --- sysdeps/x86/cpu-features.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 4cc1cd9fed..0fb02d2f39 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -672,7 +672,8 @@ init_cpu_features (struct cpu_features *cpu_features) { /* Atom / KNL tuning. */ case INTEL_ATOM_BONNELL: - /* BSF is slow on Bonnell. */ + case INTEL_ATOM_SALTWELL: + /* BSF is slow on Bonnell/Saltwell. */ cpu_features->preferred[index_arch_Slow_BSF] |= bit_arch_Slow_BSF; break; @@ -710,7 +711,6 @@ init_cpu_features (struct cpu_features *cpu_features) /* Default tuned atom microarch. */ case INTEL_ATOM_SIERRAFOREST: case INTEL_ATOM_GRANDRIDGE: - case INTEL_ATOM_SALTWELL: goto default_tuning; /* Bigcore Tuning. */