From patchwork Mon Jul 8 09:31:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rodrigo Cataldo via B4 Relay X-Patchwork-Id: 1958003 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=osuosl.org header.i=@osuosl.org header.a=rsa-sha256 header.s=default header.b=VVkAStq7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=osuosl.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WHngC3Mwdz1xpP for ; Tue, 9 Jul 2024 01:12:55 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 492B840AE0; Mon, 8 Jul 2024 15:12:48 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id FrA0FvWS1kb6; Mon, 8 Jul 2024 15:12:47 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.166.34; helo=ash.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 4AD2E40AE4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; s=default; t=1720451567; bh=SpZDXkmcSn4EIzA1VQS79soqTfzKGd09fJ6TArrGyqg=; h=From:Date:To:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:Reply-To:Cc:From; b=VVkAStq7VnMDDEzsXBIF32k72DiEBF+60Pr0BxecRPqMLKqULaHYhIAp4zszG13Vc DZO0X6zNa3wWlIIzJKR4DkB9cg8JZltWUVWCgTGh5mV5uPCTx1jkutV6ckv8x/tg9t mjjSb6UDxnmrusVsIxGkELQ4neCbvMUe87OC2viCqeAxJn8qGbUmGmbTCCk/MF63kh fMWmSlbF7HFFa956IE2SWLNxO523VLZ9RQsO8LqlO2htIpsFA7kkCJuyxhq4BKOGWt /O6p5eUy+K0EyfG29DlRUccYcgBtGags2RiOA8ZY/uB9Kx6qb7xgQGQpguyxfBZ96L YRahOAWdEJaLg== Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by smtp2.osuosl.org (Postfix) with ESMTP id 4AD2E40AE4; Mon, 8 Jul 2024 15:12:47 +0000 (UTC) X-Original-To: intel-wired-lan@lists.osuosl.org Delivered-To: intel-wired-lan@lists.osuosl.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by ash.osuosl.org (Postfix) with ESMTP id 5E1A81BF239 for ; Mon, 8 Jul 2024 09:32:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 47888404F9 for ; Mon, 8 Jul 2024 09:32:08 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id cxARIbFNKgTm for ; Mon, 8 Jul 2024 09:32:06 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=139.178.84.217; helo=dfw.source.kernel.org; envelope-from=devnull+rodrigo.cadore.l-acoustics.com@kernel.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 7EFE940540 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 7EFE940540 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by smtp2.osuosl.org (Postfix) with ESMTPS id 7EFE940540 for ; Mon, 8 Jul 2024 09:32:06 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 35C6560B21; Mon, 8 Jul 2024 09:32:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPS id D8C8DC116B1; Mon, 8 Jul 2024 09:32:04 +0000 (UTC) Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id C094AC3271E; Mon, 8 Jul 2024 09:32:04 +0000 (UTC) From: Rodrigo Cataldo via B4 Relay Date: Mon, 08 Jul 2024 11:31:57 +0200 MIME-Version: 1.0 Message-Id: <20240708-igc-flush-ptm-request-before-timeout-6-10-v1-1-70e5ebec9efe@l-acoustics.com> X-B4-Tracking: v=1; b=H4sIAAyyi2YC/x3NSwqDMBRG4a3IHfdCDDal3UrpwMY/ekETm0cVx L03dPhNzjkoIQoSPZqDIr6SJPiK9tKQnXo/gmWoJq10p27qyjJadnNJE6954YhPQcr8hgsRnGV BKJkNt4qdgdL2bpwZOqq9NcLJ/n89SbaZPTK9zvMHM7PALoQAAAA= To: Jesse Brandeburg , Tony Nguyen , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Richard Cochran , Vinicius Costa Gomes , Kurt Kanzenbach , "Christopher S. Hall" X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=11903; i=rodrigo.cadore@l-acoustics.com; h=from:subject:message-id; bh=Wfg75g71XmXkv4d1mekMWpTpqpPeT5sIegVpLSTXqac=; b=owGbwMvMwCGWd67IourLsUDG02pJDGndm4SDe2rWaU5LS9nG91Q5ameo1u2GTUn/a/imP74VF 7DqSvuljlIWBjEOBlkxRRb933xOq8uUjDgSJ6+EmcPKBDKEgYtTACZSE8bwP1HI4VDd85nKhh// LTjbek16i3PDpzXdZ1ds8LjxaeNxkzKGf3Y7eF7fZmVQcktRZ51/WK306uHtpwpNlbwXhigFsT1 cyAcA X-Developer-Key: i=rodrigo.cadore@l-acoustics.com; a=openpgp; fpr=E0F4E67DE69A235AC356157D2DDD1455748BC38F X-Endpoint-Received: by B4 Relay for rodrigo.cadore@l-acoustics.com/default with auth_id=109 X-Original-From: Rodrigo Cataldo X-Mailman-Approved-At: Mon, 08 Jul 2024 15:12:42 +0000 X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720431124; bh=PSxQJUpdzgyKLJoY9ztDBUzNUgu33n/mFOB+EZ35Res=; h=From:Date:Subject:To:Cc:Reply-To:From; b=bxeDCgNEcAlRzBsyoNuXrC7t/SmAs9R3ksxeZnuu8YhJJLTUj2Bj887a1F1cNV359 P/xZBsGCnGCLtTqROVcHH6A/5NjGp3L/cuPEj3zCIM8nf2ncp4bukEnLYSYxg8I6zB WB+++IFF6u9avgtM2FPv+VEPhTs7yMsZbpEY1kKRB6yjALSJTMochxEmZQji+ON4/5 wDhhrbILuHjIrK0sgYIIOeEEHpaolBtYzZIGcKSnZBZFTDDudQHRkzRzmPaOzPoBIm sydRx6cwDKv+g/mAoFkE2kXCbpnOy1JH0/nrQ+mW/sPjZwnziwNwSNyXwxfMUYlR6R Vz7DeS3BLF6ig== X-Mailman-Original-Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=kernel.org X-Mailman-Original-Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=bxeDCgNE Subject: [Intel-wired-lan] [PATCH iwl-net] igc: Ensure PTM request is completed before timeout has started X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: rodrigo.cadore@l-acoustics.com Cc: netdev@vger.kernel.org, Rodrigo Cataldo , intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" From: Rodrigo Cataldo When a PTM is requested via wr32(IGC_PTM_STAT), the operation may only be completed by the next read operation (flush). Unfortunately, the next read operation in the PTM request loop happens after we have already started evaluating the response timeout. Thus, the following behavior has been observed:: phc2sys-1655 [010] 103.233752: funcgraph_entry: | igc_ptp_getcrosststamp() { phc2sys-1655 [010] 103.233754: funcgraph_entry: | igc_phc_get_syncdevice_time() { phc2sys-1655 [010] 103.233755: funcgraph_entry: | igc_rd32() { phc2sys-1655 [010] 103.233931: preempt_disable: caller=irq_enter_rcu+0x14 parent=irq_enter_rcu+0x14 phc2sys-1655 [010] 103.233932: local_timer_entry: vector=236 phc2sys-1655 [010] 103.233932: hrtimer_cancel: hrtimer=0xffff8edeef526118 phc2sys-1655 [010] 103.233933: hrtimer_expire_entry: hrtimer=0xffff8edeef526118 now=103200127876 function=tick_nohz_handler/0x0 ... tick handler ... phc2sys-1655 [010] 103.233971: funcgraph_exit: ! 215.559 us | } phc2sys-1655 [010] 103.233972: funcgraph_entry: | igc_rd32() { phc2sys-1655 [010] 103.234135: funcgraph_exit: ! 164.370 us | } phc2sys-1655 [010] 103.234136: funcgraph_entry: 1.942 us | igc_rd32(); phc2sys-1655 [010] 103.234147: console: igc 0000:03:00.0 enp3s0: Timeout reading IGC_PTM_STAT register Based on the (simplified) code:: ctrl = rd32(IGC_PTM_CTRL); /* simplified: multiple writes here */ wr32(IGC_PTM_STAT, IGC_PTM_STAT_VALID); err = readx_poll_timeout(rd32, IGC_PTM_STAT, stat, stat, IGC_PTM_STAT_SLEEP, IGC_PTM_STAT_TIMEOUT); if (err < 0) { netdev_err(adapter->netdev, "Timeout reading IGC_PTM_STAT register\n"); return err; } Where readx_poll_timeout() starts the timeout evaluation before calling the rd32() parameter (rd32() is a macro for igc_rd32()). In the trace shown, the read operation of readx_poll_timeout() (second igc_rd32()) took so long that the timeout (IGC_PTM_STAT_VALID) has expired and no sleep has been performed. With this patch, a write flush is added (which is an additional igc_rd32() in practice) that can wait for the write before the timeout is evaluated:: phc2sys-1615 [010] 74.517954: funcgraph_entry: | igc_ptp_getcrosststamp() { phc2sys-1615 [010] 74.517956: funcgraph_entry: | igc_phc_get_syncdevicetime() { phc2sys-1615 [010] 74.517957: funcgraph_entry: | igc_rd32() { phc2sys-1615 [010] 74.518127: preempt_disable: caller=irq_enter_rcu+0x14 parent=irq_enter_rcu+0x14 phc2sys-1615 [010] 74.518128: local_timer_entry: vector=236 phc2sys-1615 [010] 74.518128: hrtimer_cancel: hrtimer=0xffff96466f526118 phc2sys-1615 [010] 74.518128: hrtimer_expire_entry: hrtimer=0xffff96466f526118 now=74484007229 function=tick_nohz_handler/0x0 ... tick handler ... phc2sys-1615 [010] 74.518180: funcgraph_exit: ! 222.282 us | } phc2sys-1615 [010] 74.518181: funcgraph_entry: | igc_rd32() { phc2sys-1615 [010] 74.518349: funcgraph_exit: ! 168.160 us | } phc2sys-1615 [010] 74.518349: funcgraph_entry: 1.970 us | igc_rd32(); phc2sys-1615 [010] 74.518352: hrtimer_init: hrtimer=0xffffa6f9413a3940 clockid=CLOCK_MONOTONIC mode=0x0 phc2sys-1615 [010] 74.518352: preempt_disable: caller=_raw_spin_lock_irqsave+0x28 parent=hrtimer_start_range_ns+0x56 phc2sys-1615 [010] 74.518353: hrtimer_start: hrtimer=0xffffa6f9413a3940 function=hrtimer_wakeup/0x0 expires=74484232878 softexpires=74484231878 .. hrtimer setup and return ... kworker/10:1-242 [010] 74.518382: sched_switch: kworker/10:1:242 [120] W ==> phc2sys:1615 [120] phc2sys-1615 [010] 74.518383: preempt_enable: caller=schedule+0x36 parent=schedule+0x36 phc2sys-1615 [010] 74.518384: funcgraph_entry: ! 100.088 us | igc_rd32(); phc2sys-1615 [010] 74.518484: funcgraph_entry: 1.958 us | igc_rd32(); phc2sys-1615 [010] 74.518488: funcgraph_entry: 2.019 us | igc_rd32(); phc2sys-1615 [010] 74.518490: funcgraph_entry: 1.956 us | igc_rd32(); phc2sys-1615 [010] 74.518492: funcgraph_entry: 1.980 us | igc_rd32(); phc2sys-1615 [010] 74.518494: funcgraph_exit: ! 539.386 us | } Now the sleep is called as expected, and the operation succeeds. Therefore, regardless of how long it will take for the write to be completed, we will poll+sleep at least for the time specified in IGC_PTM_STAT_TIMEOUT. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Rodrigo Cataldo Acked-by: Vinicius Costa Gomes Reviewed-by: Kurt Kanzenbach --- I have been searching for the proper solution for this PTM issue for a long time. The issue was observed on a 13700 (Raptor Lake). We also use a 8500 (Coffee Lake) that is much less susceptible for this issue, but still happens. Both are using I225-LM cards. For a 24 hours test, idle system, I have observed with 13700:: number of timeouts in 86400 seconds: 2509 The same test on a 8500:: number of timeouts in 86400 seconds: 9 Where one PTM request is sent per second. Test was done this script:: record_multiple_timeout_param() { local taskset_cpu=$1 local cur_limit=$((SECONDS + LIMIT_SECONDS )) local timeouts=0 while [ $SECONDS -lt $cur_limit ] do REL_TO=$((cur_limit - SECONDS)) timeout $REL_TO taskset $taskset_cpu \ phc2sys -c $ITF_NAME -s CLOCK_REALTIME -O 37 -m 1>/dev/null if [ $? -eq 255 ]; then timeouts=$((timeouts + 1)) fi done printf "\tnumber of timeouts in %s seconds: %d\n" $LIMIT_SECONDS $timeouts } record_multiple_timeout_param $NON_ISOLCPU_MASK Firmware version for the cards:: # lshw -class network -json | jq '.[0].product,.[0].configuration.firmware' "Ethernet Controller I225-LM" "1057:8754" # lshw -class network -json | jq '.[2].product,.[2].configuration.firmware' "Ethernet Controller I225-LM" "1057:8754 A couple of attempts were made that did not lead to solving the issue (disabling ASPM in kernel and boot, using periodic tick), and a couple of solutions that worked but that were subpar: 1. The issue was not observed for a phc2sys(8) running on a fully isolated nohz_full core. We do not have the luxury of dedicating a a core for it. 2. Bumping the IGC_PTM_STAT_TIMEOUT value. Other machines may need another value though. 3. Retry (multiple times) readx_poll_timeout() in case of failure. This may significantly increase the function latency, because each readx_poll_timeout() can take more than 100 us. 4. Disabling preemption during the PTM request. Horrible. For the Coffee Lake machine, the issue tends to be avoided because the read does not take so long. Here is basically the same trace using the Cofee Lake machine:: phc2sys-1204 [002] 1778.220288: funcgraph_entry: | igc_ptp_getcrosststamp() { phc2sys-1204 [002] 1778.220290: funcgraph_entry: | igc_phc_get_syncdevicetime() { phc2sys-1204 [002] 1778.220291: funcgraph_entry: | igc_rd32() { phc2sys-1204 [002] 1778.220373: preempt_disable: caller=irq_enter_rcu+0x14 parent=irq_enter_rcu+0x14 phc2sys-1204 [002] 1778.220374: local_timer_entry: vector=236 phc2sys-1204 [002] 1778.220375: hrtimer_cancel: hrtimer=0xffff894027326118 phc2sys-1204 [002] 1778.220376: hrtimer_expire_entry: hrtimer=0xffff894027326118 now=1778228034802 function=tick_nohz_handler/0x0 ... tick handler ... phc2sys-1204 [002] 1778.220411: funcgraph_exit: ! 119.843 us | } phc2sys-1204 [002] 1778.220412: funcgraph_entry: | igc_rd32() { phc2sys-1204 [002] 1778.220492: funcgraph_exit: + 80.094 us | } phc2sys-1204 [002] 1778.220493: funcgraph_entry: 2.951 us | igc_rd32(); phc2sys-1204 [002] 1778.220497: hrtimer_init: hrtimer=0xffffa504c0d83aa0 clockid=CLOCK_MONOTONIC mode=0x0 phc2sys-1204 [002] 1778.220498: preempt_disable: caller=_raw_spin_lock_irqsave+0x28 parent=hrtimer_start_range_ns+0x56 phc2sys-1204 [002] 1778.220499: hrtimer_start: hrtimer=0xffffa504c0d83aa0 function=hrtimer_wakeup/0x0 expires=1778228158866 softexpires=1778228157866 ... timer setup .... phc2sys-1204 [002] 1778.220509: preempt_enable: caller=_raw_spin_unlock_irqrestore+0x2b parent=hrtimer_start_range_ns+0x12d phc2sys-1204 [002] 1778.220511: funcgraph_entry: 7.338 us | igc_rd32(); phc2sys-1204 [002] 1778.220519: funcgraph_entry: 2.769 us | igc_rd32(); phc2sys-1204 [002] 1778.220522: funcgraph_entry: 2.798 us | igc_rd32(); phc2sys-1204 [002] 1778.220525: funcgraph_entry: 2.736 us | igc_rd32(); phc2sys-1204 [002] 1778.220529: funcgraph_entry: 2.750 us | igc_rd32(); phc2sys-1204 [002] 1778.220532: funcgraph_exit: ! 242.656 us | } For both machines, I observed that the first igc_rd32() after an idle period (more than 10us) tends to take significantly more time. I assume this is a hardware power-saving technique, but I could not find any mention in the manuals. This is very easily observable with an idle system running phc2sys, since it will request only once every second. This is the typical trace of the operation:: phc2sys-1204 [002] 1749.209397: funcgraph_entry: | igc_ptp_getcrosststamp() { phc2sys-1204 [002] 1749.209398: funcgraph_entry: | igc_phc_get_syncdevicetime() { phc2sys-1204 [002] 1749.209400: funcgraph_entry: + 81.491 us | igc_rd32(); phc2sys-1204 [002] 1749.209482: funcgraph_entry: 3.691 us | igc_rd32(); phc2sys-1204 [002] 1749.209487: funcgraph_entry: 2.942 us | igc_rd32(); phc2sys-1204 [002] 1749.209490: hrtimer_init: hrtimer=0xffffa504c0d83a00 clockid=CLOCK_MONOTONIC mode=0x0 phc2sys-1204 [002] 1749.209491: preempt_disable: caller=_raw_spin_lock_irqsave+0x28 parent=hrtimer_start_range_ns+0x56 ... timer setup and it goes on like before ... The preemption needs to happen for this issue, so that this power-saving mode is triggered, making the igc_rd32 'slow enough' so that all of the timeout is consumed before the card has time to answer. I believe flushing the write solves the issue definitely, since the write should be completed before the timeout has started. So that, even if the timeout is consumed by a slow read operation, the write has been received before and the card had time to process the request. --- drivers/net/ethernet/intel/igc/igc_ptp.c | 4 ++++ 1 file changed, 4 insertions(+) --- base-commit: 0005b2dc43f96b93fc5b0850d7ca3f7aeac9129c change-id: 20240705-igc-flush-ptm-request-before-timeout-6-10-f6e02c96f6d4 Best regards, diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c index 1bb026232efc..d7269e4f1a21 100644 --- a/drivers/net/ethernet/intel/igc/igc_ptp.c +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c @@ -1005,6 +1005,10 @@ static int igc_phc_get_syncdevicetime(ktime_t *device, * VALID bit. */ wr32(IGC_PTM_STAT, IGC_PTM_STAT_VALID); + /* Ensure the hardware receives the ptm request before the + * response timeout starts. + */ + wrfl(); err = readx_poll_timeout(rd32, IGC_PTM_STAT, stat, stat, IGC_PTM_STAT_SLEEP,