Message ID | 20190108201849.11907-1-gpiccoli@canonical.com |
---|---|
Headers | show |
Series | Line discipline buffer flush/tty_reopen() race fix | expand |
On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote: > BugLink: https://bugs.launchpad.net/bugs/1791758 > > [Impact] > > * Line discipline code is racy when we have buffer being flush while the > tty is being initialized or reinitialized. For the first problem, we have > an upstream patch since January 2018: > b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); > although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent > ones. > > * For the race between the buffer flush while tty is being reopened, we > have a patch that addresses this issue recently merged for 5.0-rc1: > 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). > No Ubuntu kernel currently contains this patch, hence we're hereby > submitting the SRU request. The upstream complete patch series for > this is in [0]. > > * The approach of both patches are similar - they rely in locking/semaphore > to prevent race conditions. Some additional patches are necessary to prevent > correlated issues, like preventing a potential deadlock due to bad > prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to > c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). > All the necessary fixes are grouped here in this SRU request. > > * The symptom of the race condition between the buffer flush and the > tty reopen routine is a kernel crash with the following trace: > > > BUG: unable to handle kernel paging request at 0000000000002268 > IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 > [...] > Call Trace: > [<addr>] ? kvm_sched_clock_read+0x1e/0x30 > [<addr>] n_tty_receive_buf2+0x14/0x20 > [<addr>] flush_to_ldisc+0xd5/0x120 > [<addr>] process_one_work+0x156/0x400 > [<addr>] worker_thread+0x11a/0x480 > [...] > > > * A kernel crash was collected from an user, analysis is present in > comment #4 in LP #1791758. > > > [Test Case] > > * It is not trivial to trigger this fault, but the usual recipe is to keep > accessing a machine through SSH (or keep killing getty when in IPMI serial > console) and in some way run commands before the terminal is ready in that > machine (like hacking some echo into ttySx or pts in an infinite loop). > > * We have reports of users that could reproduce this issue in their > production environment, and with the patches present in this SRU request > the problem was fixed. > > > [Regression Potential] > > * tty subsystem is highly central and patches in that area are always > delicate. For example, the upstream series [0] is a re-spin (V6) due to > a hard to reproduce issue reported in the PA-RISC architecture, which was > found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, > present in this SRU request. > > * The patchset [0] is present in tty-next tree since mid-November, and the > patch b027e2298bd5 is available upstream since January/2018 (it's available > in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of > regressions is low. > > * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) > and didn't show any issues. > > > [0] https://marc.info/?l=linux-kernel&m=154103190111795 > [1] https://marc.info/?l=linux-kernel&m=153737852618183 > > > Dmitry Safonov (4): > tty: Drop tty->count on tty_reopen() failure > tty: Hold tty_ldisc_lock() during tty_reopen() > tty: Don't block on IO when ldisc change is pending > tty: Simplify tty->count math in tty_reopen() > > Gaurav Kohli (1): > tty: fix data race between tty_init_dev and flush of buf > > drivers/tty/n_hdlc.c | 4 ++-- > drivers/tty/n_r3964.c | 2 +- > drivers/tty/n_tty.c | 8 ++++---- > drivers/tty/tty_io.c | 20 +++++++++++++++++--- > drivers/tty/tty_ldisc.c | 11 +++++++++-- > include/linux/tty.h | 9 +++++++++ > 6 files changed, 42 insertions(+), 12 deletions(-) > Those are indeed some sensitive changes and a bit intrusive, but they are needed to fix the issue and most of them have been around on mainline for some time now. I guess we'll also detect issues with regression tests if they can be easily broken. Please note that the required format for the provenance of the patch is: (backported from commit <sha1> <upstream repo name>) or (cherry-pick from commit <sha1> <upstream repo name>) so the "commit" part is missing on those patches. The <upstream repo name> can be omitted if they come from upstream. This can be fixed when applying, but please keep it in mind for the next submissions :-). Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On Wed, Jan 9, 2019 at 7:57 AM Kleber Souza <kleber.souza@canonical.com> wrote: > [...] > Those are indeed some sensitive changes and a bit intrusive, but they > are needed to fix the issue and most of them have been around on > mainline for some time now. I guess we'll also detect issues with > regression tests if they can be easily broken. > > Please note that the required format for the provenance of the patch is: > > (backported from commit <sha1> <upstream repo name>) > or > (cherry-pick from commit <sha1> <upstream repo name>) > > so the "commit" part is missing on those patches. The <upstream repo > name> can be omitted if they come from upstream. This can be fixed when > applying, but please keep it in mind for the next submissions :-). > > > Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> > > Thanks Kleber, for the ack! Sorry for the missing "commit" word, I'm not sure how I forgot it heheh Sent some SRUs last weeks, and for some reason, my mind forgot this here. Cheers, Guilherme
On 09.01.19 10:57, Kleber Souza wrote: > On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote: >> BugLink: https://bugs.launchpad.net/bugs/1791758 >> >> [Impact] >> >> * Line discipline code is racy when we have buffer being flush while the >> tty is being initialized or reinitialized. For the first problem, we have >> an upstream patch since January 2018: >> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); >> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent >> ones. >> >> * For the race between the buffer flush while tty is being reopened, we >> have a patch that addresses this issue recently merged for 5.0-rc1: >> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). >> No Ubuntu kernel currently contains this patch, hence we're hereby >> submitting the SRU request. The upstream complete patch series for >> this is in [0]. >> >> * The approach of both patches are similar - they rely in locking/semaphore >> to prevent race conditions. Some additional patches are necessary to prevent >> correlated issues, like preventing a potential deadlock due to bad >> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to >> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). >> All the necessary fixes are grouped here in this SRU request. >> >> * The symptom of the race condition between the buffer flush and the >> tty reopen routine is a kernel crash with the following trace: >> >> >> BUG: unable to handle kernel paging request at 0000000000002268 >> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 >> [...] >> Call Trace: >> [<addr>] ? kvm_sched_clock_read+0x1e/0x30 >> [<addr>] n_tty_receive_buf2+0x14/0x20 >> [<addr>] flush_to_ldisc+0xd5/0x120 >> [<addr>] process_one_work+0x156/0x400 >> [<addr>] worker_thread+0x11a/0x480 >> [...] >> >> >> * A kernel crash was collected from an user, analysis is present in >> comment #4 in LP #1791758. >> >> >> [Test Case] >> >> * It is not trivial to trigger this fault, but the usual recipe is to keep >> accessing a machine through SSH (or keep killing getty when in IPMI serial >> console) and in some way run commands before the terminal is ready in that >> machine (like hacking some echo into ttySx or pts in an infinite loop). >> >> * We have reports of users that could reproduce this issue in their >> production environment, and with the patches present in this SRU request >> the problem was fixed. >> >> >> [Regression Potential] >> >> * tty subsystem is highly central and patches in that area are always >> delicate. For example, the upstream series [0] is a re-spin (V6) due to >> a hard to reproduce issue reported in the PA-RISC architecture, which was >> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, >> present in this SRU request. >> >> * The patchset [0] is present in tty-next tree since mid-November, and the >> patch b027e2298bd5 is available upstream since January/2018 (it's available >> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of >> regressions is low. >> >> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) >> and didn't show any issues. >> >> >> [0] https://marc.info/?l=linux-kernel&m=154103190111795 >> [1] https://marc.info/?l=linux-kernel&m=153737852618183 >> >> >> Dmitry Safonov (4): >> tty: Drop tty->count on tty_reopen() failure >> tty: Hold tty_ldisc_lock() during tty_reopen() >> tty: Don't block on IO when ldisc change is pending >> tty: Simplify tty->count math in tty_reopen() >> >> Gaurav Kohli (1): >> tty: fix data race between tty_init_dev and flush of buf >> >> drivers/tty/n_hdlc.c | 4 ++-- >> drivers/tty/n_r3964.c | 2 +- >> drivers/tty/n_tty.c | 8 ++++---- >> drivers/tty/tty_io.c | 20 +++++++++++++++++--- >> drivers/tty/tty_ldisc.c | 11 +++++++++-- >> include/linux/tty.h | 9 +++++++++ >> 6 files changed, 42 insertions(+), 12 deletions(-) >> > Those are indeed some sensitive changes and a bit intrusive, but they > are needed to fix the issue and most of them have been around on > mainline for some time now. I guess we'll also detect issues with > regression tests if they can be easily broken. > > Please note that the required format for the provenance of the patch is: > > (backported from commit <sha1> <upstream repo name>) > or > (cherry-pick from commit <sha1> <upstream repo name>) no, "cherry picked from commit ..." without the - and with an "ed" ;) > > so the "commit" part is missing on those patches. The <upstream repo > name> can be omitted if they come from upstream. This can be fixed when > applying, but please keep it in mind for the next submissions :-). > > > Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> > > >
On 1/9/19 11:10 AM, Stefan Bader wrote: > On 09.01.19 10:57, Kleber Souza wrote: >> On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote: >>> BugLink: https://bugs.launchpad.net/bugs/1791758 >>> >>> [Impact] >>> >>> * Line discipline code is racy when we have buffer being flush while the >>> tty is being initialized or reinitialized. For the first problem, we have >>> an upstream patch since January 2018: >>> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); >>> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent >>> ones. >>> >>> * For the race between the buffer flush while tty is being reopened, we >>> have a patch that addresses this issue recently merged for 5.0-rc1: >>> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). >>> No Ubuntu kernel currently contains this patch, hence we're hereby >>> submitting the SRU request. The upstream complete patch series for >>> this is in [0]. >>> >>> * The approach of both patches are similar - they rely in locking/semaphore >>> to prevent race conditions. Some additional patches are necessary to prevent >>> correlated issues, like preventing a potential deadlock due to bad >>> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to >>> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). >>> All the necessary fixes are grouped here in this SRU request. >>> >>> * The symptom of the race condition between the buffer flush and the >>> tty reopen routine is a kernel crash with the following trace: >>> >>> >>> BUG: unable to handle kernel paging request at 0000000000002268 >>> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 >>> [...] >>> Call Trace: >>> [<addr>] ? kvm_sched_clock_read+0x1e/0x30 >>> [<addr>] n_tty_receive_buf2+0x14/0x20 >>> [<addr>] flush_to_ldisc+0xd5/0x120 >>> [<addr>] process_one_work+0x156/0x400 >>> [<addr>] worker_thread+0x11a/0x480 >>> [...] >>> >>> >>> * A kernel crash was collected from an user, analysis is present in >>> comment #4 in LP #1791758. >>> >>> >>> [Test Case] >>> >>> * It is not trivial to trigger this fault, but the usual recipe is to keep >>> accessing a machine through SSH (or keep killing getty when in IPMI serial >>> console) and in some way run commands before the terminal is ready in that >>> machine (like hacking some echo into ttySx or pts in an infinite loop). >>> >>> * We have reports of users that could reproduce this issue in their >>> production environment, and with the patches present in this SRU request >>> the problem was fixed. >>> >>> >>> [Regression Potential] >>> >>> * tty subsystem is highly central and patches in that area are always >>> delicate. For example, the upstream series [0] is a re-spin (V6) due to >>> a hard to reproduce issue reported in the PA-RISC architecture, which was >>> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, >>> present in this SRU request. >>> >>> * The patchset [0] is present in tty-next tree since mid-November, and the >>> patch b027e2298bd5 is available upstream since January/2018 (it's available >>> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of >>> regressions is low. >>> >>> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) >>> and didn't show any issues. >>> >>> >>> [0] https://marc.info/?l=linux-kernel&m=154103190111795 >>> [1] https://marc.info/?l=linux-kernel&m=153737852618183 >>> >>> >>> Dmitry Safonov (4): >>> tty: Drop tty->count on tty_reopen() failure >>> tty: Hold tty_ldisc_lock() during tty_reopen() >>> tty: Don't block on IO when ldisc change is pending >>> tty: Simplify tty->count math in tty_reopen() >>> >>> Gaurav Kohli (1): >>> tty: fix data race between tty_init_dev and flush of buf >>> >>> drivers/tty/n_hdlc.c | 4 ++-- >>> drivers/tty/n_r3964.c | 2 +- >>> drivers/tty/n_tty.c | 8 ++++---- >>> drivers/tty/tty_io.c | 20 +++++++++++++++++--- >>> drivers/tty/tty_ldisc.c | 11 +++++++++-- >>> include/linux/tty.h | 9 +++++++++ >>> 6 files changed, 42 insertions(+), 12 deletions(-) >>> >> Those are indeed some sensitive changes and a bit intrusive, but they >> are needed to fix the issue and most of them have been around on >> mainline for some time now. I guess we'll also detect issues with >> regression tests if they can be easily broken. >> >> Please note that the required format for the provenance of the patch is: >> >> (backported from commit <sha1> <upstream repo name>) >> or >> (cherry-pick from commit <sha1> <upstream repo name>) > no, "cherry picked from commit ..." without the - and with an "ed" ;) Thanks for pointing that out. I just copied and pasted from one of our wiki pages without paying too much attention, I've fixed the wiki page now :) > >> so the "commit" part is missing on those patches. The <upstream repo >> name> can be omitted if they come from upstream. This can be fixed when >> applying, but please keep it in mind for the next submissions :-). >> >> >> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> >> >> >>
On 08.01.19 21:18, Guilherme G. Piccoli wrote: > BugLink: https://bugs.launchpad.net/bugs/1791758 > > [Impact] > > * Line discipline code is racy when we have buffer being flush while the > tty is being initialized or reinitialized. For the first problem, we have > an upstream patch since January 2018: > b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); > although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent > ones. > > * For the race between the buffer flush while tty is being reopened, we > have a patch that addresses this issue recently merged for 5.0-rc1: > 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). > No Ubuntu kernel currently contains this patch, hence we're hereby > submitting the SRU request. The upstream complete patch series for > this is in [0]. > > * The approach of both patches are similar - they rely in locking/semaphore > to prevent race conditions. Some additional patches are necessary to prevent > correlated issues, like preventing a potential deadlock due to bad > prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to > c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). > All the necessary fixes are grouped here in this SRU request. > > * The symptom of the race condition between the buffer flush and the > tty reopen routine is a kernel crash with the following trace: > > > BUG: unable to handle kernel paging request at 0000000000002268 > IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 > [...] > Call Trace: > [<addr>] ? kvm_sched_clock_read+0x1e/0x30 > [<addr>] n_tty_receive_buf2+0x14/0x20 > [<addr>] flush_to_ldisc+0xd5/0x120 > [<addr>] process_one_work+0x156/0x400 > [<addr>] worker_thread+0x11a/0x480 > [...] > > > * A kernel crash was collected from an user, analysis is present in > comment #4 in LP #1791758. > > > [Test Case] > > * It is not trivial to trigger this fault, but the usual recipe is to keep > accessing a machine through SSH (or keep killing getty when in IPMI serial > console) and in some way run commands before the terminal is ready in that > machine (like hacking some echo into ttySx or pts in an infinite loop). > > * We have reports of users that could reproduce this issue in their > production environment, and with the patches present in this SRU request > the problem was fixed. > > > [Regression Potential] > > * tty subsystem is highly central and patches in that area are always > delicate. For example, the upstream series [0] is a re-spin (V6) due to > a hard to reproduce issue reported in the PA-RISC architecture, which was > found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, > present in this SRU request. > > * The patchset [0] is present in tty-next tree since mid-November, and the > patch b027e2298bd5 is available upstream since January/2018 (it's available > in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of > regressions is low. > > * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) > and didn't show any issues. > > > [0] https://marc.info/?l=linux-kernel&m=154103190111795 > [1] https://marc.info/?l=linux-kernel&m=153737852618183 > > > Dmitry Safonov (4): > tty: Drop tty->count on tty_reopen() failure > tty: Hold tty_ldisc_lock() during tty_reopen() > tty: Don't block on IO when ldisc change is pending > tty: Simplify tty->count math in tty_reopen() > > Gaurav Kohli (1): > tty: fix data race between tty_init_dev and flush of buf > > drivers/tty/n_hdlc.c | 4 ++-- > drivers/tty/n_r3964.c | 2 +- > drivers/tty/n_tty.c | 8 ++++---- > drivers/tty/tty_io.c | 20 +++++++++++++++++--- > drivers/tty/tty_ldisc.c | 11 +++++++++-- > include/linux/tty.h | 9 +++++++++ > 6 files changed, 42 insertions(+), 12 deletions(-) > Just commenting to repeat the already mentioned fixup of the cherry-pick lines on commit. Acked-by: Stefan Bader <stefan.bader@canonical.com>
Fixed the provenance lines. On 2019-01-08 18:18:44 , Guilherme G. Piccoli wrote: > BugLink: https://bugs.launchpad.net/bugs/1791758 > > [Impact] > > * Line discipline code is racy when we have buffer being flush while the > tty is being initialized or reinitialized. For the first problem, we have > an upstream patch since January 2018: > b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf"); > although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent > ones. > > * For the race between the buffer flush while tty is being reopened, we > have a patch that addresses this issue recently merged for 5.0-rc1: > 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). > No Ubuntu kernel currently contains this patch, hence we're hereby > submitting the SRU request. The upstream complete patch series for > this is in [0]. > > * The approach of both patches are similar - they rely in locking/semaphore > to prevent race conditions. Some additional patches are necessary to prevent > correlated issues, like preventing a potential deadlock due to bad > prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to > c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). > All the necessary fixes are grouped here in this SRU request. > > * The symptom of the race condition between the buffer flush and the > tty reopen routine is a kernel crash with the following trace: > > > BUG: unable to handle kernel paging request at 0000000000002268 > IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 > [...] > Call Trace: > [<addr>] ? kvm_sched_clock_read+0x1e/0x30 > [<addr>] n_tty_receive_buf2+0x14/0x20 > [<addr>] flush_to_ldisc+0xd5/0x120 > [<addr>] process_one_work+0x156/0x400 > [<addr>] worker_thread+0x11a/0x480 > [...] > > > * A kernel crash was collected from an user, analysis is present in > comment #4 in LP #1791758. > > > [Test Case] > > * It is not trivial to trigger this fault, but the usual recipe is to keep > accessing a machine through SSH (or keep killing getty when in IPMI serial > console) and in some way run commands before the terminal is ready in that > machine (like hacking some echo into ttySx or pts in an infinite loop). > > * We have reports of users that could reproduce this issue in their > production environment, and with the patches present in this SRU request > the problem was fixed. > > > [Regression Potential] > > * tty subsystem is highly central and patches in that area are always > delicate. For example, the upstream series [0] is a re-spin (V6) due to > a hard to reproduce issue reported in the PA-RISC architecture, which was > found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, > present in this SRU request. > > * The patchset [0] is present in tty-next tree since mid-November, and the > patch b027e2298bd5 is available upstream since January/2018 (it's available > in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of > regressions is low. > > * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) > and didn't show any issues. > > > [0] https://marc.info/?l=linux-kernel&m=154103190111795 > [1] https://marc.info/?l=linux-kernel&m=153737852618183 > > > Dmitry Safonov (4): > tty: Drop tty->count on tty_reopen() failure > tty: Hold tty_ldisc_lock() during tty_reopen() > tty: Don't block on IO when ldisc change is pending > tty: Simplify tty->count math in tty_reopen() > > Gaurav Kohli (1): > tty: fix data race between tty_init_dev and flush of buf > > drivers/tty/n_hdlc.c | 4 ++-- > drivers/tty/n_r3964.c | 2 +- > drivers/tty/n_tty.c | 8 ++++---- > drivers/tty/tty_io.c | 20 +++++++++++++++++--- > drivers/tty/tty_ldisc.c | 11 +++++++++-- > include/linux/tty.h | 9 +++++++++ > 6 files changed, 42 insertions(+), 12 deletions(-) > > -- > 2.19.2 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team