mbox series

[SRU,D/E/F/G/U,0/1] Revert code that causes crashes on default configurations

Message ID 20200709195402.1835538-1-cascardo@canonical.com
Headers show
Series Revert code that causes crashes on default configurations | expand

Message

Thadeu Lima de Souza Cascardo July 9, 2020, 7:54 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1886668

This is the same patch as sent to Bionic 4.15 kernels.

[Impact]
On systems using cgroups and sockets extensively, like docker, kubernetes,
lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.

[Fix]
Revert the patch that disables sk_alloc cgroup refcounting when tasks
are added to net_prio cgroup.

[Test case]
Test that such environments where the issue is reproduced survive some hours of
uptime. A different bug was reproduced with a work-in-progress code and was not
reproduced with the culprit reverted.

[Regression potential]
The reverted commit fix a memory leak on similar scenarios. But a leak is
better than a crash. Two other bugs have been opened to track a real fix for
this issue and the leak.

Thadeu Lima de Souza Cascardo (1):
  UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2
    cgroups"

 net/core/netprio_cgroup.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Marcelo Henrique Cerri July 9, 2020, 8:07 p.m. UTC | #1
Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>

On Thu, Jul 09, 2020 at 04:54:00PM -0300, Thadeu Lima de Souza Cascardo wrote:
> BugLink: https://bugs.launchpad.net/bugs/1886668
> 
> This is the same patch as sent to Bionic 4.15 kernels.
> 
> [Impact]
> On systems using cgroups and sockets extensively, like docker, kubernetes,
> lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.
> 
> [Fix]
> Revert the patch that disables sk_alloc cgroup refcounting when tasks
> are added to net_prio cgroup.
> 
> [Test case]
> Test that such environments where the issue is reproduced survive some hours of
> uptime. A different bug was reproduced with a work-in-progress code and was not
> reproduced with the culprit reverted.
> 
> [Regression potential]
> The reverted commit fix a memory leak on similar scenarios. But a leak is
> better than a crash. Two other bugs have been opened to track a real fix for
> this issue and the leak.
> 
> Thadeu Lima de Souza Cascardo (1):
>   UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2
>     cgroups"
> 
>  net/core/netprio_cgroup.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Ian May July 9, 2020, 8:33 p.m. UTC | #2
Acked-by: Ian May <ian.may@canonical.com>

On 7/9/20 2:54 PM, Thadeu Lima de Souza Cascardo wrote:
> BugLink: https://bugs.launchpad.net/bugs/1886668
>
> This is the same patch as sent to Bionic 4.15 kernels.
>
> [Impact]
> On systems using cgroups and sockets extensively, like docker, kubernetes,
> lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.
>
> [Fix]
> Revert the patch that disables sk_alloc cgroup refcounting when tasks
> are added to net_prio cgroup.
>
> [Test case]
> Test that such environments where the issue is reproduced survive some hours of
> uptime. A different bug was reproduced with a work-in-progress code and was not
> reproduced with the culprit reverted.
>
> [Regression potential]
> The reverted commit fix a memory leak on similar scenarios. But a leak is
> better than a crash. Two other bugs have been opened to track a real fix for
> this issue and the leak.
>
> Thadeu Lima de Souza Cascardo (1):
>   UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2
>     cgroups"
>
>  net/core/netprio_cgroup.c | 2 --
>  1 file changed, 2 deletions(-)
>
Khalid Elmously July 9, 2020, 11:41 p.m. UTC | #3
On 2020-07-09 16:54:00 , Thadeu Lima de Souza Cascardo wrote:
> BugLink: https://bugs.launchpad.net/bugs/1886668
> 
> This is the same patch as sent to Bionic 4.15 kernels.
> 
> [Impact]
> On systems using cgroups and sockets extensively, like docker, kubernetes,
> lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.
> 
> [Fix]
> Revert the patch that disables sk_alloc cgroup refcounting when tasks
> are added to net_prio cgroup.
> 
> [Test case]
> Test that such environments where the issue is reproduced survive some hours of
> uptime. A different bug was reproduced with a work-in-progress code and was not
> reproduced with the culprit reverted.
> 
> [Regression potential]
> The reverted commit fix a memory leak on similar scenarios. But a leak is
> better than a crash. Two other bugs have been opened to track a real fix for
> this issue and the leak.
> 
> Thadeu Lima de Souza Cascardo (1):
>   UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2
>     cgroups"
> 
>  net/core/netprio_cgroup.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Thomas Lamprecht July 14, 2020, 6:49 a.m. UTC | #4
On 10.07.20 01:41, Khaled Elmously wrote:
> On 2020-07-09 16:54:00 , Thadeu Lima de Souza Cascardo wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1886668
>>
>> This is the same patch as sent to Bionic 4.15 kernels.
>>
>> [Impact]
>> On systems using cgroups and sockets extensively, like docker, kubernetes,
>> lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.
>>
>> [Fix]
>> Revert the patch that disables sk_alloc cgroup refcounting when tasks
>> are added to net_prio cgroup.
>>

Just FYI: The upstream "real fix" for this is [0] plus a followup [1].
I originally wanted to sent it out to the ubuntu kernel list, but got
overloaded with other things and totally forgot about it, sorry!
Figured out that sharing this late is better than never when seeing the
new Ubuntu tag just now.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad0f75e5f57ccbceec13274e1e242f2b5a6397ed
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14b032b8f8fce03a546dcf365454bec8c4a58d7d

>> [Test case]
>> Test that such environments where the issue is reproduced survive some hours of
>> uptime. A different bug was reproduced with a work-in-progress code and was not
>> reproduced with the culprit reverted.

There's a more direct proposed repoducer [2].

[2]: https://lore.kernel.org/netdev/42baf0f8-627b-0ab8-72fc-12d24667ad0a@huawei.com/

cheers,
Thomas
Seth Forshee July 17, 2020, 10:28 p.m. UTC | #5
On Thu, Jul 09, 2020 at 04:54:00PM -0300, Thadeu Lima de Souza Cascardo wrote:
> BugLink: https://bugs.launchpad.net/bugs/1886668
> 
> This is the same patch as sent to Bionic 4.15 kernels.
> 
> [Impact]
> On systems using cgroups and sockets extensively, like docker, kubernetes,
> lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.
> 
> [Fix]
> Revert the patch that disables sk_alloc cgroup refcounting when tasks
> are added to net_prio cgroup.
> 
> [Test case]
> Test that such environments where the issue is reproduced survive some hours of
> uptime. A different bug was reproduced with a work-in-progress code and was not
> reproduced with the culprit reverted.
> 
> [Regression potential]
> The reverted commit fix a memory leak on similar scenarios. But a leak is
> better than a crash. Two other bugs have been opened to track a real fix for
> this issue and the leak.

Applied to unstable, thanks!