From patchwork Tue Feb 27 14:10:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philip Cox X-Patchwork-Id: 1905116 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TkfbB2YhFz23qP for ; Wed, 28 Feb 2024 01:13:10 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1reyCb-00078t-MG; Tue, 27 Feb 2024 14:13:01 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1reyCO-000753-0Z for kernel-team@lists.ubuntu.com; Tue, 27 Feb 2024 14:12:51 +0000 Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id A7C6241049 for ; Tue, 27 Feb 2024 14:12:47 +0000 (UTC) Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-68f95177e25so57320696d6.2 for ; Tue, 27 Feb 2024 06:12:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709043166; x=1709647966; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jVcqDWvJ/0CXm2jBA7iGaRH0f9g0bCA7tb19mRCELlo=; b=MT5ePY1m8XH4f0TcX3jNG0u2LLEjMtJkI3K57Q1XP/KgeFXyP8Cj9wNzw/ABzgSiwE cnCFnKfZxLo8IY29apXhEXj1McQEM3E4QmbsdYmgkNJJceXGyrd8FdO4W4BVvVldQuwk hTrrM/gf3/LHxlpeLN9iyW3s1MOWtdouiLpznTFlDEcNkssaBZrtsS7dFfHAl2hfDa3/ 3gP9K3SszXX4MwPaz0a+exzPXTeZDK0+F8/3aDVrfMRSPjRver1h7li10Tspm/knYaCm aTf+jsXXIp854jKFSGSOkVqlag7gnZ4mta8jY1tvXJHJ01aCgDJuLK+z8WmTx/kKBHbF a1wg== X-Gm-Message-State: AOJu0Yyu7TxRUripK/ku6Mc70vTWZUL8NNfquyu5S9EqYCdVDInbNpEo sxz6x9TCbiibZNHbOmPzxiDdOhcZ8EPy5cNK78ljY0WfenpoyGfHnrbmEpzI9sPs6WudxyLkL/l 3KsNYZzjkVfljuTBvHg2vHfcICqpYzS9imTFTz1YsgAb7NtMmv0VrhTxDVvAIydOY8gI3MuJ3N5 35/FvASZkQvg== X-Received: by 2002:a05:6214:cce:b0:68f:5738:3d45 with SMTP id 14-20020a0562140cce00b0068f57383d45mr2347196qvx.40.1709043166187; Tue, 27 Feb 2024 06:12:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IEDIHHxBmy7Zb8ShfH2HTw/A192EtWCjPwmNfSL6ZH+lr6D314s5G/zGAdRpaVqncrUTPSQ+w== X-Received: by 2002:a05:6214:cce:b0:68f:5738:3d45 with SMTP id 14-20020a0562140cce00b0068f57383d45mr2347175qvx.40.1709043165842; Tue, 27 Feb 2024 06:12:45 -0800 (PST) Received: from cox.home.arpa ([108.175.227.176]) by smtp.gmail.com with ESMTPSA id oq12-20020a056214460c00b006901303d90bsm1769778qvb.79.2024.02.27.06.12.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 06:12:43 -0800 (PST) From: Philip Cox To: kernel-team@lists.ubuntu.com Subject: [SRU][mantic:linux][PATCH 1/1] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Date: Tue, 27 Feb 2024 09:10:37 -0500 Message-Id: <20240227141037.1744821-2-philip.cox@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240227141037.1744821-1-philip.cox@canonical.com> References: <20240227141037.1744821-1-philip.cox@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Feng Tang BugLink: https://bugs.launchpad.net/bugs/2054699 There were reports again that the tsc clocksource on 4 sockets x86 servers was wrongly judged as 'unstable' by 'jiffies' and other watchdogs, and disabled [1][2]. Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") was introduce to deal with these false alarms of tsc unstable issues, covering qualified platforms for 2 sockets or smaller ones. And from history of chasing TSC issues, Thomas and Peter only saw real TSC synchronization issue on 8 socket machines. So extend the exemption to 4 sockets to fix the issue. Rui also proposed another way to disable 'jiffies' as clocksource watchdog [3], which can also solve problem in [1]. in an architecture independent way, but can't cure the problem in [2]. whose watchdog is HPET or PMTIMER, while 'jiffies' is mostly used as watchdog in boot phase. 'nr_online_nodes' has known inaccurate problem for cases like platform with cpu-less memory nodes, sub numa cluster enabled, fakenuma, kernel cmdline parameter 'maxcpus=', etc. The harmful case is the 'maxcpus' one which could possibly under estimates the package number, and disable the watchdog, but bright side is it is mostly for debug usage. All these will be addressed in other patches, as discussed in thread [4]. [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/ [2]. https://lore.kernel.org/lkml/06df410c-2177-4671-832f-339cff05b1d9@paulmck-laptop/ [3]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ [4]. https://lore.kernel.org/all/20221021062131.1826810-1-feng.tang@intel.com/ Reported-by: Yu Liao Reported-by: Paul E. McKenney Signed-off-by: Feng Tang Signed-off-by: Paul E. McKenney (cherry picked from commit 233756a640be811efae33763db718fe29753b1e9) Signed-off-by: Philip Cox --- arch/x86/kernel/tsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 3425c6a943e4..15f97c0abc9d 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1258,7 +1258,7 @@ static void __init check_system_tsc_reliable(void) if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - nr_online_nodes <= 2) + nr_online_nodes <= 4) tsc_disable_clocksource_watchdog(); }