From patchwork Tue Jan 8 20:18:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guilherme G. Piccoli" X-Patchwork-Id: 1022119 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43Z3WB0j6Wz9sDT; Wed, 9 Jan 2019 07:19:01 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ggxpw-000820-56; Tue, 08 Jan 2019 20:18:56 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1ggxpu-00081u-V4 for kernel-team@lists.ubuntu.com; Tue, 08 Jan 2019 20:18:54 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1ggxpu-0007Tw-LE for kernel-team@lists.ubuntu.com; Tue, 08 Jan 2019 20:18:54 +0000 Received: by mail-qt1-f200.google.com with SMTP id k90so4564705qte.0 for ; Tue, 08 Jan 2019 12:18:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OlmI1hyuh8o4kz7+POSHjN9clVcKJVmv4/ShEWyrrmM=; b=sNcR1yl1wGbNW90ZvXIocjU5J5R31zx0zbYuiFKVqGoPtKVqugSrs6G+GO2erB/OtU UR4xjx8x6DNIZeXqCLP6xMnuTGRI356sn6QgkWBZu25vzgKkuYUgxxlHzbeBho8/L9fU GC5bjRgANN9idFL/VKAzOouPxZx9rl18iL/R3GtD+cvu1+9AjSSiQMxuapKFJI2g5/v0 XiOVvLl9rhDTuyw4hiPWpv3EMLi7XCgCDeyLYCThDXbvL866EngnSiK6LpuaQhrz8UsW BC3SGxbIuoMMPqG6rrOfcxHyO4AWbVewwsH4XnsAZfMkF364zOrE1FszvVcyWxEKMhTn bt7Q== X-Gm-Message-State: AJcUukclYLUFyEtvsVoraJRblDXVFo3Ixbwiju7SmYlGRz6cIfsBRwly Km7XpAIeBfmJfcyI74zWM64cx/l7wDR5tAWO+pTOfdK5/JkpmA90n/HSFA54QYraoAxhfb78W48 2T2wyrOa/oW8pJ2lWpzL+GXcozAg9yToQRFaubB2jYA== X-Received: by 2002:a37:5f82:: with SMTP id t124mr2900742qkb.204.1546978733549; Tue, 08 Jan 2019 12:18:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN6csytAnSkAPMkk0YVotiUZbBW9FkCJlyg++t9CLkqJs61LnZUAK79oqm5QhanQDgUABMV/aQ== X-Received: by 2002:a37:5f82:: with SMTP id t124mr2900730qkb.204.1546978733306; Tue, 08 Jan 2019 12:18:53 -0800 (PST) Received: from localhost ([191.13.50.232]) by smtp.gmail.com with ESMTPSA id e49sm36928631qta.0.2019.01.08.12.18.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Jan 2019 12:18:52 -0800 (PST) From: "Guilherme G. Piccoli" To: kernel-team@lists.ubuntu.com Subject: [SRU X] [PATCH 0/5] Line discipline buffer flush/tty_reopen() race fix Date: Tue, 8 Jan 2019 18:18:44 -0200 Message-Id: <20190108201849.11907-1-gpiccoli@canonical.com> X-Mailer: git-send-email 2.19.2 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gpiccoli@canonical.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1791758 [Impact] * Line discipline code is racy when we have buffer being flush while the tty is being initialized or reinitialized. For the first problem, we have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones. * For the race between the buffer flush while tty is being reopened, we have a patch that addresses this issue recently merged for 5.0-rc1: 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No Ubuntu kernel currently contains this patch, hence we're hereby submitting the SRU request. The upstream complete patch series for this is in [0]. * The approach of both patches are similar - they rely in locking/semaphore to prevent race conditions. Some additional patches are necessary to prevent correlated issues, like preventing a potential deadlock due to bad prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). All the necessary fixes are grouped here in this SRU request. * The symptom of the race condition between the buffer flush and the tty reopen routine is a kernel crash with the following trace: BUG: unable to handle kernel paging request at 0000000000002268 IP: [] n_tty_receive_buf_common+0x6a/0xae0 [...] Call Trace: [] ? kvm_sched_clock_read+0x1e/0x30 [] n_tty_receive_buf2+0x14/0x20 [] flush_to_ldisc+0xd5/0x120 [] process_one_work+0x156/0x400 [] worker_thread+0x11a/0x480 [...] * A kernel crash was collected from an user, analysis is present in comment #4 in LP #1791758. [Test Case] * It is not trivial to trigger this fault, but the usual recipe is to keep accessing a machine through SSH (or keep killing getty when in IPMI serial console) and in some way run commands before the terminal is ready in that machine (like hacking some echo into ttySx or pts in an infinite loop). * We have reports of users that could reproduce this issue in their production environment, and with the patches present in this SRU request the problem was fixed. [Regression Potential] * tty subsystem is highly central and patches in that area are always delicate. For example, the upstream series [0] is a re-spin (V6) due to a hard to reproduce issue reported in the PA-RISC architecture, which was found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, present in this SRU request. * The patchset [0] is present in tty-next tree since mid-November, and the patch b027e2298bd5 is available upstream since January/2018 (it's available in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of regressions is low. * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) and didn't show any issues. [0] https://marc.info/?l=linux-kernel&m=154103190111795 [1] https://marc.info/?l=linux-kernel&m=153737852618183 Dmitry Safonov (4): tty: Drop tty->count on tty_reopen() failure tty: Hold tty_ldisc_lock() during tty_reopen() tty: Don't block on IO when ldisc change is pending tty: Simplify tty->count math in tty_reopen() Gaurav Kohli (1): tty: fix data race between tty_init_dev and flush of buf drivers/tty/n_hdlc.c | 4 ++-- drivers/tty/n_r3964.c | 2 +- drivers/tty/n_tty.c | 8 ++++---- drivers/tty/tty_io.c | 20 +++++++++++++++++--- drivers/tty/tty_ldisc.c | 11 +++++++++-- include/linux/tty.h | 9 +++++++++ 6 files changed, 42 insertions(+), 12 deletions(-) Acked-by: Kleber Sacilotto de Souza Acked-by: Stefan Bader