From patchwork Sun May 10 16:25:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guilherme G. Piccoli" X-Patchwork-Id: 1287224 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49KqDD5c0gz9sRY; Mon, 11 May 2020 02:25:15 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jXolM-0007Kn-3L; Sun, 10 May 2020 16:25:12 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jXolK-0007KT-2o for kernel-team@lists.ubuntu.com; Sun, 10 May 2020 16:25:10 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jXolJ-0007QR-N9 for kernel-team@lists.ubuntu.com; Sun, 10 May 2020 16:25:09 +0000 Received: by mail-qv1-f71.google.com with SMTP id l17so1631625qvm.12 for ; Sun, 10 May 2020 09:25:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eui9i89J2sxMSDGUXr9a5PLZ5n5UqR8/ICzfH1sbasw=; b=FYDgEeL3aEjg+jhRlJZwdEVyRbcuiLB1DyDl8bhEHK5kifqPQnMVSWi1jqAnbVDK98 5lm1HKe26sId0ibuNpmWCa8iuiVKk3qzxVsOnLWUUH3YcUorg3+LlhUqzFocG8pcHhpW TIi5vQSV7rQYkZjVMWLap23xoVgptL+HhNg8W1uZCCp7XtMRIj8wNSz12Af65RXb6z6Q 4b8nqReU8s4Ewv/GGTwEB3qHgWVd1WVO1P5yaiQaHwm28bLqUVJtqzVphucf2FWh3CXW yrI6rSBVSIXIJbcRLzEkuyQQwYgH5RLIaJ/PsROCzSm1KmJCObOoJ50bI/uW1vJIipag Ic4Q== X-Gm-Message-State: AGi0PuaWvuJcrwODDfhS9sKaPLX+6KJ+cHBe2vKX/h2oueEev3ZEfaSc ZZyZ8AovXiDKGigzOp3lURC+0KxVbxSsGft2FHLu1REPFP5SfJRIZeEQ49LddEWiSqAFB7cC8sp N7fad7rK8d0lHxy0Xw1y0qG0fFfXGIjquP7byD4vPxw== X-Received: by 2002:ac8:1c3d:: with SMTP id a58mr11505082qtk.52.1589127908429; Sun, 10 May 2020 09:25:08 -0700 (PDT) X-Google-Smtp-Source: APiQypKtbGPOrFhQ5eM3+0DjsRm4ifYypTBy42rtg6h5gOw8LEzvI2UyxwSRxyK/J9183nZXm0pz5A== X-Received: by 2002:ac8:1c3d:: with SMTP id a58mr11505060qtk.52.1589127908128; Sun, 10 May 2020 09:25:08 -0700 (PDT) Received: from localhost ([152.254.147.183]) by smtp.gmail.com with ESMTPSA id z65sm2566308qkc.91.2020.05.10.09.25.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 10 May 2020 09:25:07 -0700 (PDT) From: "Guilherme G. Piccoli" To: kernel-team@lists.ubuntu.com Subject: [X/B][PATCH 1/2] x86/tsc: Make calibration refinement more robust Date: Sun, 10 May 2020 13:25:00 -0300 Message-Id: <20200510162501.27181-2-gpiccoli@canonical.com> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200510162501.27181-1-gpiccoli@canonical.com> References: <20200510162501.27181-1-gpiccoli@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gpiccoli@canonical.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Daniel Vacek BugLink: https://bugs.launchpad.net/bugs/1877858 The threshold in tsc_read_refs() is constant which may favor slower CPUs but may not be optimal for simple reading of reference on faster ones. Hence make it proportional to tsc_khz when available to compensate for this. The threshold guards against any disturbance like IRQs, NMIs, SMIs or CPU stealing by host on guest systems so rename it accordingly and fix comments as well. Also on some systems there is noticeable DMI bus contention at some point during boot keeping the readout failing (observed with about one in ~300 boots when testing). In that case retry also the second readout instead of simply bailing out unrefined. Usually the next second the readout returns fast just fine without any issues. Signed-off-by: Daniel Vacek Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: "H. Peter Anvin" Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@redhat.com (cherry picked from commit a786ef152cdcfebc923a67f63c7815806eefcf81) Signed-off-by: Guilherme G. Piccoli --- arch/x86/kernel/tsc.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 83f521254241..f7ff73271199 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -281,15 +281,16 @@ static int __init tsc_setup(char *str) __setup("tsc=", tsc_setup); -#define MAX_RETRIES 5 -#define SMI_TRESHOLD 50000 +#define MAX_RETRIES 5 +#define TSC_DEFAULT_THRESHOLD 0x20000 /* - * Read TSC and the reference counters. Take care of SMI disturbance + * Read TSC and the reference counters. Take care of any disturbances */ static u64 tsc_read_refs(u64 *p, int hpet) { u64 t1, t2; + u64 thresh = tsc_khz ? tsc_khz >> 5 : TSC_DEFAULT_THRESHOLD; int i; for (i = 0; i < MAX_RETRIES; i++) { @@ -299,7 +300,7 @@ static u64 tsc_read_refs(u64 *p, int hpet) else *p = acpi_pm_read_early(); t2 = get_cycles(); - if ((t2 - t1) < SMI_TRESHOLD) + if ((t2 - t1) < thresh) return t2; } return ULLONG_MAX; @@ -700,15 +701,15 @@ unsigned long native_calibrate_cpu(void) * zero. In each wait loop iteration we read the TSC and check * the delta to the previous read. We keep track of the min * and max values of that delta. The delta is mostly defined - * by the IO time of the PIT access, so we can detect when a - * SMI/SMM disturbance happened between the two reads. If the + * by the IO time of the PIT access, so we can detect when + * any disturbance happened between the two reads. If the * maximum time is significantly larger than the minimum time, * then we discard the result and have another try. * * 2) Reference counter. If available we use the HPET or the * PMTIMER as a reference to check the sanity of that value. * We use separate TSC readouts and check inside of the - * reference read for a SMI/SMM disturbance. We dicard + * reference read for any possible disturbance. We dicard * disturbed values here as well. We do that around the PIT * calibration delay loop as we have to wait for a certain * amount of time anyway. @@ -741,7 +742,7 @@ unsigned long native_calibrate_cpu(void) if (ref1 == ref2) continue; - /* Check, whether the sampling was disturbed by an SMI */ + /* Check, whether the sampling was disturbed */ if (tsc1 == ULLONG_MAX || tsc2 == ULLONG_MAX) continue; @@ -1180,7 +1181,7 @@ static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work); */ static void tsc_refine_calibration_work(struct work_struct *work) { - static u64 tsc_start = -1, ref_start; + static u64 tsc_start = ULLONG_MAX, ref_start; static int hpet; u64 tsc_stop, ref_stop, delta; unsigned long freq; @@ -1195,14 +1196,15 @@ static void tsc_refine_calibration_work(struct work_struct *work) * delayed the first time we expire. So set the workqueue * again once we know timers are working. */ - if (tsc_start == -1) { + if (tsc_start == ULLONG_MAX) { +restart: /* * Only set hpet once, to avoid mixing hardware * if the hpet becomes enabled later. */ hpet = is_hpet_enabled(); - schedule_delayed_work(&tsc_irqwork, HZ); tsc_start = tsc_read_refs(&ref_start, hpet); + schedule_delayed_work(&tsc_irqwork, HZ); return; } @@ -1212,9 +1214,9 @@ static void tsc_refine_calibration_work(struct work_struct *work) if (ref_start == ref_stop) goto out; - /* Check, whether the sampling was disturbed by an SMI */ - if (tsc_start == ULLONG_MAX || tsc_stop == ULLONG_MAX) - goto out; + /* Check, whether the sampling was disturbed */ + if (tsc_stop == ULLONG_MAX) + goto restart; delta = tsc_stop - tsc_start; delta *= 1000000LL; From patchwork Sun May 10 16:25:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guilherme G. Piccoli" X-Patchwork-Id: 1287226 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49KqDH2WvFz9sSW; Mon, 11 May 2020 02:25:19 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jXolP-0007Lf-8i; Sun, 10 May 2020 16:25:15 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jXolN-0007LE-5G for kernel-team@lists.ubuntu.com; Sun, 10 May 2020 16:25:13 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jXolM-0007Qo-Qv for kernel-team@lists.ubuntu.com; Sun, 10 May 2020 16:25:12 +0000 Received: by mail-qt1-f198.google.com with SMTP id z5so6285595qtz.16 for ; Sun, 10 May 2020 09:25:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZmKSlHMWO8EUT5FbtxZX5VBkEETnWiLutw3j4D8/9Oc=; b=S1EvSP+HJ+VvJ27IGDQzRzDcyzmk89vP9zZgVsSJmX+x98xwjl9CaDQmIo2jQMOinI MgwDlTL6RoCv/b2bDuimOMqHsFF2KEi5u1TvbAzKWt3QF6k3TvcgKsf5/1sybwGZQ7il /pYz5r4rnAMuUKJy3V7T1JlKt+GOrXJ3aNEu3rSb4ElARZA/65f6auth7VHN4shxRCi4 qGoShRjupEHVXDxdWPVggh0kX0700HpI03DogZ4RQD/pDZ+4XiR6a34HZTKF5qCn+gFe MWl5Yl5mrUdk7dE2d5ttxuGhJ0kJaY3Gzo7LNup5APoqjNiCkfP80rtp41z5DTILm6uV hz3w== X-Gm-Message-State: AGi0PuZu0kqjJu5c1kHiwLqwfcPdTkEA7JpdfgvTOgu0T+uQV4JR70m2 1lVNVfnEtxZjjhmAnKY24wkb83KHxHwOnTPHQokaE0Q5TJkU4t6IxxTA/pK8IIzAlKPvL877HHS OfmBrzM2+y9ZilEbaJ1ess2ashYsbYKowj8n098YJkA== X-Received: by 2002:ac8:739a:: with SMTP id t26mr12313646qtp.311.1589127911554; Sun, 10 May 2020 09:25:11 -0700 (PDT) X-Google-Smtp-Source: APiQypIS6HvMLl4gxl4yAO/jztpvZ4G9+vYvi9iQkJ27+2vsf25HvW6Q55fruumMfOVldv7JuVn/4Q== X-Received: by 2002:ac8:739a:: with SMTP id t26mr12313630qtp.311.1589127911289; Sun, 10 May 2020 09:25:11 -0700 (PDT) Received: from localhost ([152.254.147.183]) by smtp.gmail.com with ESMTPSA id d4sm6647139qtw.25.2020.05.10.09.25.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 10 May 2020 09:25:10 -0700 (PDT) From: "Guilherme G. Piccoli" To: kernel-team@lists.ubuntu.com Subject: [B][PATCH 2/2] x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency Date: Sun, 10 May 2020 13:25:01 -0300 Message-Id: <20200510162501.27181-3-gpiccoli@canonical.com> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200510162501.27181-1-gpiccoli@canonical.com> References: <20200510162501.27181-1-gpiccoli@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gpiccoli@canonical.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Daniel Drake BugLink: https://bugs.launchpad.net/bugs/1877858 native_calibrate_tsc() had a data mapping Intel CPU families and crystal clock speed, but hardcoded tables are not ideal, and this approach was already problematic at least in the Skylake X case, as seen in commit: b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon") By examining CPUID data from http://instlatx64.atw.hu/ and units in the lab, we have found that 3 different scenarios need to be dealt with, and we can eliminate most of the hardcoded data using an approach a little more advanced than before: 1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets from this point) report the crystal frequency directly via CPUID.0x15. That's definitive data that we can rely upon. 2. Skylake, Kabylake and all variants of those two chipsets report a crystal frequency of zero, however we can calculate the crystal clock speed by condidering data from CPUID.0x16. This method correctly distinguishes between the two crystal clock frequencies present on different Skylake X variants that caused headaches before. As the calculations do not quite match the previously-hardcoded values in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is enabled on all platforms where we had to calculate the crystal frequency in this way. 3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does not support CPUID.0x16, so we leave this entry hardcoded. Suggested-by: Thomas Gleixner Signed-off-by: Daniel Drake Reviewed-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: len.brown@intel.com Cc: linux@endlessm.com Cc: rafael.j.wysocki@intel.com Link: http://lkml.kernel.org/r/20190509055417.13152-1-drake@endlessm.com Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com Signed-off-by: Ingo Molnar (cherry picked from commit 604dc9170f2435d27da5039a3efd757dceadc684) Signed-off-by: Guilherme G. Piccoli --- IMPORTANT: This one is for Bionic only! Thanks, Guilherme arch/x86/kernel/tsc.c | 47 +++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index f7ff73271199..af08d9ace410 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -613,31 +613,38 @@ unsigned long native_calibrate_tsc(void) crystal_khz = ecx_hz / 1000; - if (crystal_khz == 0) { - switch (boot_cpu_data.x86_model) { - case INTEL_FAM6_SKYLAKE_MOBILE: - case INTEL_FAM6_SKYLAKE_DESKTOP: - case INTEL_FAM6_KABYLAKE_MOBILE: - case INTEL_FAM6_KABYLAKE_DESKTOP: - crystal_khz = 24000; /* 24.0 MHz */ - break; - case INTEL_FAM6_ATOM_GOLDMONT_X: - crystal_khz = 25000; /* 25.0 MHz */ - break; - case INTEL_FAM6_ATOM_GOLDMONT: - crystal_khz = 19200; /* 19.2 MHz */ - break; - } - } + /* + * Denverton SoCs don't report crystal clock, and also don't support + * CPUID.0x16 for the calculation below, so hardcode the 25MHz crystal + * clock. + */ + if (crystal_khz == 0 && + boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X) + crystal_khz = 25000; - if (crystal_khz == 0) - return 0; /* - * TSC frequency determined by CPUID is a "hardware reported" + * TSC frequency reported directly by CPUID is a "hardware reported" * frequency and is the most accurate one so far we have. This * is considered a known frequency. */ - setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); + if (crystal_khz != 0) + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); + + /* + * Some Intel SoCs like Skylake and Kabylake don't report the crystal + * clock, but we can easily calculate it to a high degree of accuracy + * by considering the crystal ratio and the CPU speed. + */ + if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) { + unsigned int eax_base_mhz, ebx, ecx, edx; + + cpuid(0x16, &eax_base_mhz, &ebx, &ecx, &edx); + crystal_khz = eax_base_mhz * 1000 * + eax_denominator / ebx_numerator; + } + + if (crystal_khz == 0) + return 0; /* * For Atom SoCs TSC is the only reliable clocksource.