mbox series

[SRU,X,0/5] Line discipline buffer flush/tty_reopen() race fix

Message ID 20190108201849.11907-1-gpiccoli@canonical.com
Headers show
Series Line discipline buffer flush/tty_reopen() race fix | expand

Message

Guilherme G. Piccoli Jan. 8, 2019, 8:18 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1791758

[Impact]

* Line discipline code is racy when we have buffer being flush while the
tty is being initialized or reinitialized. For the first problem, we have
an upstream patch since January 2018:
b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
ones.

* For the race between the buffer flush while tty is being reopened, we
have a patch that addresses this issue recently merged for 5.0-rc1:
83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
No Ubuntu kernel currently contains this patch, hence we're hereby
submitting the SRU request. The upstream complete patch series for
this is in [0].

* The approach of both patches are similar - they rely in locking/semaphore
to prevent race conditions. Some additional patches are necessary to prevent
correlated issues, like preventing a potential deadlock due to bad
prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
All the necessary fixes are grouped here in this SRU request.

* The symptom of the race condition between the buffer flush and the
tty reopen routine is a kernel crash with the following trace:


BUG: unable to handle kernel paging request at 0000000000002268
IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
[...]
Call Trace:
[<addr>] ? kvm_sched_clock_read+0x1e/0x30
[<addr>] n_tty_receive_buf2+0x14/0x20
[<addr>] flush_to_ldisc+0xd5/0x120
[<addr>] process_one_work+0x156/0x400
[<addr>] worker_thread+0x11a/0x480
[...]


* A kernel crash was collected from an user, analysis is present in
comment #4 in LP #1791758.


[Test Case]

* It is not trivial to trigger this fault, but the usual recipe is to keep
accessing a machine through SSH (or keep killing getty when in IPMI serial
console) and in some way run commands before the terminal is ready in that
machine (like hacking some echo into ttySx or pts in an infinite loop).

* We have reports of users that could reproduce this issue in their
production environment, and with the patches present in this SRU request
the problem was fixed.


[Regression Potential]

* tty subsystem is highly central and patches in that area are always
delicate. For example, the upstream series [0] is a re-spin (V6) due to
a hard to reproduce issue reported in the PA-RISC architecture, which was
found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
present in this SRU request.

* The patchset [0] is present in tty-next tree since mid-November, and the
patch b027e2298bd5 is available upstream since January/2018 (it's available
in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
regressions is low.

* These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
and didn't show any issues.


[0] https://marc.info/?l=linux-kernel&m=154103190111795
[1] https://marc.info/?l=linux-kernel&m=153737852618183


Dmitry Safonov (4):
  tty: Drop tty->count on tty_reopen() failure
  tty: Hold tty_ldisc_lock() during tty_reopen()
  tty: Don't block on IO when ldisc change is pending
  tty: Simplify tty->count math in tty_reopen()

Gaurav Kohli (1):
  tty: fix data race between tty_init_dev and flush of buf

 drivers/tty/n_hdlc.c    |  4 ++--
 drivers/tty/n_r3964.c   |  2 +-
 drivers/tty/n_tty.c     |  8 ++++----
 drivers/tty/tty_io.c    | 20 +++++++++++++++++---
 drivers/tty/tty_ldisc.c | 11 +++++++++--
 include/linux/tty.h     |  9 +++++++++
 6 files changed, 42 insertions(+), 12 deletions(-)

Comments

Kleber Sacilotto de Souza Jan. 9, 2019, 9:57 a.m. UTC | #1
On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1791758
>
> [Impact]
>
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
>
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
>
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
>
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
>
>
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
>
>
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
>
>
> [Test Case]
>
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
>
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
>
>
> [Regression Potential]
>
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
>
> * The patchset [0] is present in tty-next tree since mid-November, and the
> patch b027e2298bd5 is available upstream since January/2018 (it's available
> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
> regressions is low.
>
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
>
>
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>
>
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
>
> Gaurav Kohli (1):
>   tty: fix data race between tty_init_dev and flush of buf
>
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 20 +++++++++++++++++---
>  drivers/tty/tty_ldisc.c | 11 +++++++++--
>  include/linux/tty.h     |  9 +++++++++
>  6 files changed, 42 insertions(+), 12 deletions(-)
>
Those are indeed some sensitive changes and a bit intrusive, but they
are needed to fix the issue and most of them have been around on
mainline for some time now. I guess we'll also detect issues with
regression tests if they can be easily broken.

Please note that the required format for the provenance of the patch is:

 (backported from commit <sha1> <upstream repo name>)
or
 (cherry-pick from commit <sha1> <upstream repo name>)

so the "commit" part is missing on those patches. The <upstream repo
name> can be omitted if they come from upstream. This can be fixed when
applying, but please keep it in mind for the next submissions :-).


Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Guilherme G. Piccoli Jan. 9, 2019, 10:01 a.m. UTC | #2
On Wed, Jan 9, 2019 at 7:57 AM Kleber Souza <kleber.souza@canonical.com> wrote:
> [...]
> Those are indeed some sensitive changes and a bit intrusive, but they
> are needed to fix the issue and most of them have been around on
> mainline for some time now. I guess we'll also detect issues with
> regression tests if they can be easily broken.
>
> Please note that the required format for the provenance of the patch is:
>
>  (backported from commit <sha1> <upstream repo name>)
> or
>  (cherry-pick from commit <sha1> <upstream repo name>)
>
> so the "commit" part is missing on those patches. The <upstream repo
> name> can be omitted if they come from upstream. This can be fixed when
> applying, but please keep it in mind for the next submissions :-).
>
>
> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
>
>

Thanks Kleber, for the ack! Sorry for the missing "commit" word, I'm not sure
how I forgot it heheh
Sent some SRUs last weeks, and for some reason, my mind forgot this here.

Cheers,


Guilherme
Stefan Bader Jan. 9, 2019, 10:10 a.m. UTC | #3
On 09.01.19 10:57, Kleber Souza wrote:
> On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1791758
>>
>> [Impact]
>>
>> * Line discipline code is racy when we have buffer being flush while the
>> tty is being initialized or reinitialized. For the first problem, we have
>> an upstream patch since January 2018:
>> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
>> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
>> ones.
>>
>> * For the race between the buffer flush while tty is being reopened, we
>> have a patch that addresses this issue recently merged for 5.0-rc1:
>> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
>> No Ubuntu kernel currently contains this patch, hence we're hereby
>> submitting the SRU request. The upstream complete patch series for
>> this is in [0].
>>
>> * The approach of both patches are similar - they rely in locking/semaphore
>> to prevent race conditions. Some additional patches are necessary to prevent
>> correlated issues, like preventing a potential deadlock due to bad
>> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
>> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
>> All the necessary fixes are grouped here in this SRU request.
>>
>> * The symptom of the race condition between the buffer flush and the
>> tty reopen routine is a kernel crash with the following trace:
>>
>>
>> BUG: unable to handle kernel paging request at 0000000000002268
>> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
>> [...]
>> Call Trace:
>> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
>> [<addr>] n_tty_receive_buf2+0x14/0x20
>> [<addr>] flush_to_ldisc+0xd5/0x120
>> [<addr>] process_one_work+0x156/0x400
>> [<addr>] worker_thread+0x11a/0x480
>> [...]
>>
>>
>> * A kernel crash was collected from an user, analysis is present in
>> comment #4 in LP #1791758.
>>
>>
>> [Test Case]
>>
>> * It is not trivial to trigger this fault, but the usual recipe is to keep
>> accessing a machine through SSH (or keep killing getty when in IPMI serial
>> console) and in some way run commands before the terminal is ready in that
>> machine (like hacking some echo into ttySx or pts in an infinite loop).
>>
>> * We have reports of users that could reproduce this issue in their
>> production environment, and with the patches present in this SRU request
>> the problem was fixed.
>>
>>
>> [Regression Potential]
>>
>> * tty subsystem is highly central and patches in that area are always
>> delicate. For example, the upstream series [0] is a re-spin (V6) due to
>> a hard to reproduce issue reported in the PA-RISC architecture, which was
>> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
>> present in this SRU request.
>>
>> * The patchset [0] is present in tty-next tree since mid-November, and the
>> patch b027e2298bd5 is available upstream since January/2018 (it's available
>> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
>> regressions is low.
>>
>> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
>> and didn't show any issues.
>>
>>
>> [0] https://marc.info/?l=linux-kernel&m=154103190111795
>> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>>
>>
>> Dmitry Safonov (4):
>>   tty: Drop tty->count on tty_reopen() failure
>>   tty: Hold tty_ldisc_lock() during tty_reopen()
>>   tty: Don't block on IO when ldisc change is pending
>>   tty: Simplify tty->count math in tty_reopen()
>>
>> Gaurav Kohli (1):
>>   tty: fix data race between tty_init_dev and flush of buf
>>
>>  drivers/tty/n_hdlc.c    |  4 ++--
>>  drivers/tty/n_r3964.c   |  2 +-
>>  drivers/tty/n_tty.c     |  8 ++++----
>>  drivers/tty/tty_io.c    | 20 +++++++++++++++++---
>>  drivers/tty/tty_ldisc.c | 11 +++++++++--
>>  include/linux/tty.h     |  9 +++++++++
>>  6 files changed, 42 insertions(+), 12 deletions(-)
>>
> Those are indeed some sensitive changes and a bit intrusive, but they
> are needed to fix the issue and most of them have been around on
> mainline for some time now. I guess we'll also detect issues with
> regression tests if they can be easily broken.
> 
> Please note that the required format for the provenance of the patch is:
> 
>  (backported from commit <sha1> <upstream repo name>)
> or
>  (cherry-pick from commit <sha1> <upstream repo name>)

no, "cherry picked from commit ..." without the - and with an "ed" ;)

> 
> so the "commit" part is missing on those patches. The <upstream repo
> name> can be omitted if they come from upstream. This can be fixed when
> applying, but please keep it in mind for the next submissions :-).
> 
> 
> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
> 
> 
>
Kleber Sacilotto de Souza Jan. 9, 2019, 10:32 a.m. UTC | #4
On 1/9/19 11:10 AM, Stefan Bader wrote:
> On 09.01.19 10:57, Kleber Souza wrote:
>> On 1/8/19 9:18 PM, Guilherme G. Piccoli wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/1791758
>>>
>>> [Impact]
>>>
>>> * Line discipline code is racy when we have buffer being flush while the
>>> tty is being initialized or reinitialized. For the first problem, we have
>>> an upstream patch since January 2018:
>>> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
>>> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
>>> ones.
>>>
>>> * For the race between the buffer flush while tty is being reopened, we
>>> have a patch that addresses this issue recently merged for 5.0-rc1:
>>> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
>>> No Ubuntu kernel currently contains this patch, hence we're hereby
>>> submitting the SRU request. The upstream complete patch series for
>>> this is in [0].
>>>
>>> * The approach of both patches are similar - they rely in locking/semaphore
>>> to prevent race conditions. Some additional patches are necessary to prevent
>>> correlated issues, like preventing a potential deadlock due to bad
>>> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
>>> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
>>> All the necessary fixes are grouped here in this SRU request.
>>>
>>> * The symptom of the race condition between the buffer flush and the
>>> tty reopen routine is a kernel crash with the following trace:
>>>
>>>
>>> BUG: unable to handle kernel paging request at 0000000000002268
>>> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
>>> [...]
>>> Call Trace:
>>> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
>>> [<addr>] n_tty_receive_buf2+0x14/0x20
>>> [<addr>] flush_to_ldisc+0xd5/0x120
>>> [<addr>] process_one_work+0x156/0x400
>>> [<addr>] worker_thread+0x11a/0x480
>>> [...]
>>>
>>>
>>> * A kernel crash was collected from an user, analysis is present in
>>> comment #4 in LP #1791758.
>>>
>>>
>>> [Test Case]
>>>
>>> * It is not trivial to trigger this fault, but the usual recipe is to keep
>>> accessing a machine through SSH (or keep killing getty when in IPMI serial
>>> console) and in some way run commands before the terminal is ready in that
>>> machine (like hacking some echo into ttySx or pts in an infinite loop).
>>>
>>> * We have reports of users that could reproduce this issue in their
>>> production environment, and with the patches present in this SRU request
>>> the problem was fixed.
>>>
>>>
>>> [Regression Potential]
>>>
>>> * tty subsystem is highly central and patches in that area are always
>>> delicate. For example, the upstream series [0] is a re-spin (V6) due to
>>> a hard to reproduce issue reported in the PA-RISC architecture, which was
>>> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
>>> present in this SRU request.
>>>
>>> * The patchset [0] is present in tty-next tree since mid-November, and the
>>> patch b027e2298bd5 is available upstream since January/2018 (it's available
>>> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
>>> regressions is low.
>>>
>>> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
>>> and didn't show any issues.
>>>
>>>
>>> [0] https://marc.info/?l=linux-kernel&m=154103190111795
>>> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>>>
>>>
>>> Dmitry Safonov (4):
>>>   tty: Drop tty->count on tty_reopen() failure
>>>   tty: Hold tty_ldisc_lock() during tty_reopen()
>>>   tty: Don't block on IO when ldisc change is pending
>>>   tty: Simplify tty->count math in tty_reopen()
>>>
>>> Gaurav Kohli (1):
>>>   tty: fix data race between tty_init_dev and flush of buf
>>>
>>>  drivers/tty/n_hdlc.c    |  4 ++--
>>>  drivers/tty/n_r3964.c   |  2 +-
>>>  drivers/tty/n_tty.c     |  8 ++++----
>>>  drivers/tty/tty_io.c    | 20 +++++++++++++++++---
>>>  drivers/tty/tty_ldisc.c | 11 +++++++++--
>>>  include/linux/tty.h     |  9 +++++++++
>>>  6 files changed, 42 insertions(+), 12 deletions(-)
>>>
>> Those are indeed some sensitive changes and a bit intrusive, but they
>> are needed to fix the issue and most of them have been around on
>> mainline for some time now. I guess we'll also detect issues with
>> regression tests if they can be easily broken.
>>
>> Please note that the required format for the provenance of the patch is:
>>
>>  (backported from commit <sha1> <upstream repo name>)
>> or
>>  (cherry-pick from commit <sha1> <upstream repo name>)
> no, "cherry picked from commit ..." without the - and with an "ed" ;)

Thanks for pointing that out. I just copied and pasted from one of our
wiki pages without paying too much attention, I've fixed the wiki page
now :)

>
>> so the "commit" part is missing on those patches. The <upstream repo
>> name> can be omitted if they come from upstream. This can be fixed when
>> applying, but please keep it in mind for the next submissions :-).
>>
>>
>> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
>>
>>
>>
Stefan Bader Jan. 9, 2019, 2:02 p.m. UTC | #5
On 08.01.19 21:18, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1791758
> 
> [Impact]
> 
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
> 
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
> 
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
> 
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
> 
> 
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
> 
> 
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
> 
> 
> [Test Case]
> 
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
> 
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
> 
> 
> [Regression Potential]
> 
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
> 
> * The patchset [0] is present in tty-next tree since mid-November, and the
> patch b027e2298bd5 is available upstream since January/2018 (it's available
> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
> regressions is low.
> 
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
> 
> 
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
> 
> 
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
> 
> Gaurav Kohli (1):
>   tty: fix data race between tty_init_dev and flush of buf
> 
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 20 +++++++++++++++++---
>  drivers/tty/tty_ldisc.c | 11 +++++++++--
>  include/linux/tty.h     |  9 +++++++++
>  6 files changed, 42 insertions(+), 12 deletions(-)
> 
Just commenting to repeat the already mentioned fixup of the cherry-pick lines
on commit.

Acked-by: Stefan Bader <stefan.bader@canonical.com>
Khalid Elmously Jan. 10, 2019, midnight UTC | #6
Fixed the provenance lines.


On 2019-01-08 18:18:44 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1791758
> 
> [Impact]
> 
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
> 
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
> 
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
> 
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
> 
> 
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
> 
> 
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
> 
> 
> [Test Case]
> 
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
> 
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
> 
> 
> [Regression Potential]
> 
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
> 
> * The patchset [0] is present in tty-next tree since mid-November, and the
> patch b027e2298bd5 is available upstream since January/2018 (it's available
> in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of
> regressions is low.
> 
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
> 
> 
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
> 
> 
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
> 
> Gaurav Kohli (1):
>   tty: fix data race between tty_init_dev and flush of buf
> 
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 20 +++++++++++++++++---
>  drivers/tty/tty_ldisc.c | 11 +++++++++--
>  include/linux/tty.h     |  9 +++++++++
>  6 files changed, 42 insertions(+), 12 deletions(-)
> 
> -- 
> 2.19.2
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team