Message ID | 20210623004433.22819-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | btrfs: Attempting to balance a nearly full filesystem with relocated root nodes fails | expand |
Acked-by: Tim Gardner <tim.gardner@canonical.com> Good test results. On 6/22/21 6:44 PM, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1933172 > > [Impact] > > If you attempt to balance a btrfs filesystem that is nearly full, and this > filesystem has had a lot of small, medium and large files created and deleted, > such that the b-tree needs to be rotated, when the balance fails due to not > having enough free space, the kernel oops, and the btrfs filesystem hangs. > > It doesn't appear to cause any filesystem corruption, and is reproducible every > time on affected filesystems. > > The following oops is generated: > > general protection fault: 0000 [#1] SMP PTI > CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs] > RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282 > RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390 > RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028 > RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228 > R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08 > R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000 > FS: 00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0 > Call Trace: > ? commit_fs_roots+0x130/0x1b0 [btrfs] > ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs] > btrfs_commit_transaction+0x42c/0x910 [btrfs] > ? start_transaction+0x191/0x430 [btrfs] > relocate_block_group+0x1e7/0x640 [btrfs] > btrfs_relocate_block_group+0x18f/0x280 [btrfs] > btrfs_relocate_chunk+0x38/0xd0 [btrfs] > __btrfs_balance+0x972/0xcd0 [btrfs] > ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs] > btrfs_balance+0x32c/0x5a0 [btrfs] > btrfs_ioctl_balance+0x320/0x390 [btrfs] > btrfs_ioctl+0x5a6/0x2490 [btrfs] > ? lru_cache_add_active_or_unevictable+0x36/0xb0 > ? __handle_mm_fault+0x9fd/0x1290 > do_vfs_ioctl+0xa8/0x630 > ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] > ? do_vfs_ioctl+0xa8/0x630 > ? __do_page_fault+0x2a1/0x4b0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x73/0x130 > entry_SYSCALL_64_after_hwframe+0x41/0xa6 > RIP: 0033:0x7f48d7228317 > RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317 > RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003 > RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000 > R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003 > R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000 > Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87 > RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0 > > I don't see this behaviour on any upstream kernel, and the first kernel to show > this behaviour is 4.15.0-109-generic. The current 4.15.0-145-generic is still > affected. > > I believe that this is a regression introduced in the fixing of CVE-2019-19036. > > [Testcase] > > I haven't reliably been able to create a script which places a btrfs filesystem > into the state necessary to reproduce this issue, so I have just provided my > qcow2 image with my btrfs filesystem which reproduces the issue 100% of the time. > > Download the image from here (warning size is 8.0gb): > > https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2 > > Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to a new > virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an operating system, > it is just a data only volume. > > Mount the volume: > > $ sudo mount /dev/vdb /mnt > > Attempt to balance: > > $ sudo btrfs filesystem balance start --full-balance /mnt > Segmentation fault (core dumped) > > Check dmesg for kernel oops: > https://paste.ubuntu.com/p/wjJNqKBCfh/ > > If you install the test kernel from the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test > > You should see this instead: > > $ sudo btrfs filesystem balance start --full-balance /mnt > ERROR: error during balancing '/mnt': No space left on device > There may be more info in syslog - try dmesg | tail > > Checking dmesg shows no kernel oops, and just info about the volume being too > full to balance: > > https://paste.ubuntu.com/p/4J8Gq2dtz4/ > > [Fix] > > I found the problem to be introduced in 4.15.0-109-generic, and > 4.15.0-108-generic and earlier worked fine, which means we introduced a > regression somewhere. > > I bisected the problem down to the following commit: > > ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/4qfWCM8ykh/ > > Unfortunately, I believe this is a bad backport. If you examine the original > upstream commit: > > commit 51415b6c1b117e223bc083e30af675cb5c5498f3 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3 > > You will see the 4.15 backport has calls to free_extent_buffer() and > btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to btrfs_put_root() > in the newer patches, and contains logic to free relocated roots, so I think > we might not need the calls to free_extent_buffer() to free the extents first, > since it might be handled later. > > The core issue is that we hit a general protection fault when attempting to > access a root node, which means we have freed a root node we shouldn't have. > > If we look at the backport in 5.4.y, aka, the one in Focal: > > ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/ > > It seems upstream -stable omitted the calls to btrfs_put_root() entirely, and > we don't need the calls to free_extent_buffer() because of it. > > If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-bionic, and > cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from ubuntu-focal, and > build, the problem no longer reproduces. > > [Where problems could occur] > > If a regression were to occur, it would affect users of btrfs filesystems, and > would likely show during a routine balance operation. Since the issue is > triggered during the cancellation of a balance operation, problems might occur > for users with nearly full filesystems or filesystems that have existing > corruption. > > We are replacing a patch that was backported during the fixing of CVE-2019-19036, > and replacing it with a backport provided by upstream developers, which cherry > picks from 5.4.y to Bionic. The patch in 5.4.y is well tested by the community > and is currently in the Focal kernel. > > With all modifications to btrfs, there is a risk of data corruption and > filesystem corruption for all btrfs users, since balances happen automatically > and on a regular basis. If a regression does happen, users should remount > their filesystems with the "nobalance" flag, backup their data, and attempt a > repair if necessary. > > [Other info] > > A community member has hit this issue before I did, and has reported it upstream > to linux-btrfs here, although no one knew what was happening: > > https://www.spinics.net/lists/linux-btrfs/msg103367.html > > Matthew Ruffell (1): > Revert "btrfs: reloc: fix reloc root leak and NULL pointer > dereference" > > Qu Wenruo (1): > btrfs: reloc: fix reloc root leak and NULL pointer dereference > > fs/btrfs/relocation.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) >
On Tue, Jun 22, 2021 at 9:45 PM Matthew Ruffell <matthew.ruffell@canonical.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/1933172 > > [Impact] > > If you attempt to balance a btrfs filesystem that is nearly full, and this > filesystem has had a lot of small, medium and large files created and deleted, > such that the b-tree needs to be rotated, when the balance fails due to not > having enough free space, the kernel oops, and the btrfs filesystem hangs. > > It doesn't appear to cause any filesystem corruption, and is reproducible every > time on affected filesystems. > > The following oops is generated: > > general protection fault: 0000 [#1] SMP PTI > CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs] > RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282 > RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390 > RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028 > RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228 > R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08 > R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000 > FS: 00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0 > Call Trace: > ? commit_fs_roots+0x130/0x1b0 [btrfs] > ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs] > btrfs_commit_transaction+0x42c/0x910 [btrfs] > ? start_transaction+0x191/0x430 [btrfs] > relocate_block_group+0x1e7/0x640 [btrfs] > btrfs_relocate_block_group+0x18f/0x280 [btrfs] > btrfs_relocate_chunk+0x38/0xd0 [btrfs] > __btrfs_balance+0x972/0xcd0 [btrfs] > ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs] > btrfs_balance+0x32c/0x5a0 [btrfs] > btrfs_ioctl_balance+0x320/0x390 [btrfs] > btrfs_ioctl+0x5a6/0x2490 [btrfs] > ? lru_cache_add_active_or_unevictable+0x36/0xb0 > ? __handle_mm_fault+0x9fd/0x1290 > do_vfs_ioctl+0xa8/0x630 > ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] > ? do_vfs_ioctl+0xa8/0x630 > ? __do_page_fault+0x2a1/0x4b0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x73/0x130 > entry_SYSCALL_64_after_hwframe+0x41/0xa6 > RIP: 0033:0x7f48d7228317 > RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317 > RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003 > RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000 > R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003 > R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000 > Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87 > RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0 > > I don't see this behaviour on any upstream kernel, and the first kernel to show > this behaviour is 4.15.0-109-generic. The current 4.15.0-145-generic is still > affected. > > I believe that this is a regression introduced in the fixing of CVE-2019-19036. > > [Testcase] > > I haven't reliably been able to create a script which places a btrfs filesystem > into the state necessary to reproduce this issue, so I have just provided my > qcow2 image with my btrfs filesystem which reproduces the issue 100% of the time. > > Download the image from here (warning size is 8.0gb): > > https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2 > > Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to a new > virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an operating system, > it is just a data only volume. > > Mount the volume: > > $ sudo mount /dev/vdb /mnt > > Attempt to balance: > > $ sudo btrfs filesystem balance start --full-balance /mnt > Segmentation fault (core dumped) > > Check dmesg for kernel oops: > https://paste.ubuntu.com/p/wjJNqKBCfh/ > > If you install the test kernel from the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test > > You should see this instead: > > $ sudo btrfs filesystem balance start --full-balance /mnt > ERROR: error during balancing '/mnt': No space left on device > There may be more info in syslog - try dmesg | tail > > Checking dmesg shows no kernel oops, and just info about the volume being too > full to balance: > > https://paste.ubuntu.com/p/4J8Gq2dtz4/ > > [Fix] > > I found the problem to be introduced in 4.15.0-109-generic, and > 4.15.0-108-generic and earlier worked fine, which means we introduced a > regression somewhere. > > I bisected the problem down to the following commit: > > ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/4qfWCM8ykh/ > > Unfortunately, I believe this is a bad backport. If you examine the original > upstream commit: > > commit 51415b6c1b117e223bc083e30af675cb5c5498f3 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3 > > You will see the 4.15 backport has calls to free_extent_buffer() and > btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to btrfs_put_root() > in the newer patches, and contains logic to free relocated roots, so I think > we might not need the calls to free_extent_buffer() to free the extents first, > since it might be handled later. > > The core issue is that we hit a general protection fault when attempting to > access a root node, which means we have freed a root node we shouldn't have. > > If we look at the backport in 5.4.y, aka, the one in Focal: > > ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/ > > It seems upstream -stable omitted the calls to btrfs_put_root() entirely, and > we don't need the calls to free_extent_buffer() because of it. > > If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-bionic, and > cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from ubuntu-focal, and > build, the problem no longer reproduces. > > [Where problems could occur] > > If a regression were to occur, it would affect users of btrfs filesystems, and > would likely show during a routine balance operation. Since the issue is > triggered during the cancellation of a balance operation, problems might occur > for users with nearly full filesystems or filesystems that have existing > corruption. > > We are replacing a patch that was backported during the fixing of CVE-2019-19036, > and replacing it with a backport provided by upstream developers, which cherry > picks from 5.4.y to Bionic. The patch in 5.4.y is well tested by the community > and is currently in the Focal kernel. > > With all modifications to btrfs, there is a risk of data corruption and > filesystem corruption for all btrfs users, since balances happen automatically > and on a regular basis. If a regression does happen, users should remount > their filesystems with the "nobalance" flag, backup their data, and attempt a > repair if necessary. > > [Other info] > > A community member has hit this issue before I did, and has reported it upstream > to linux-btrfs here, although no one knew what was happening: > > https://www.spinics.net/lists/linux-btrfs/msg103367.html > > Matthew Ruffell (1): > Revert "btrfs: reloc: fix reloc root leak and NULL pointer > dereference" > > Qu Wenruo (1): > btrfs: reloc: fix reloc root leak and NULL pointer dereference > > fs/btrfs/relocation.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > Hi Matthew, great analysis, thanks a bunch! The revert + fix seems good to me, so: Acked-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Applied to Bionic master-next. Thank you! -Kelsey On 2021-06-23 12:44:31 , Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1933172 > > [Impact] > > If you attempt to balance a btrfs filesystem that is nearly full, and this > filesystem has had a lot of small, medium and large files created and deleted, > such that the b-tree needs to be rotated, when the balance fails due to not > having enough free space, the kernel oops, and the btrfs filesystem hangs. > > It doesn't appear to cause any filesystem corruption, and is reproducible every > time on affected filesystems. > > The following oops is generated: > > general protection fault: 0000 [#1] SMP PTI > CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs] > RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282 > RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390 > RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028 > RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228 > R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08 > R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000 > FS: 00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0 > Call Trace: > ? commit_fs_roots+0x130/0x1b0 [btrfs] > ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs] > btrfs_commit_transaction+0x42c/0x910 [btrfs] > ? start_transaction+0x191/0x430 [btrfs] > relocate_block_group+0x1e7/0x640 [btrfs] > btrfs_relocate_block_group+0x18f/0x280 [btrfs] > btrfs_relocate_chunk+0x38/0xd0 [btrfs] > __btrfs_balance+0x972/0xcd0 [btrfs] > ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs] > btrfs_balance+0x32c/0x5a0 [btrfs] > btrfs_ioctl_balance+0x320/0x390 [btrfs] > btrfs_ioctl+0x5a6/0x2490 [btrfs] > ? lru_cache_add_active_or_unevictable+0x36/0xb0 > ? __handle_mm_fault+0x9fd/0x1290 > do_vfs_ioctl+0xa8/0x630 > ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] > ? do_vfs_ioctl+0xa8/0x630 > ? __do_page_fault+0x2a1/0x4b0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x73/0x130 > entry_SYSCALL_64_after_hwframe+0x41/0xa6 > RIP: 0033:0x7f48d7228317 > RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317 > RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003 > RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000 > R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003 > R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000 > Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87 > RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0 > > I don't see this behaviour on any upstream kernel, and the first kernel to show > this behaviour is 4.15.0-109-generic. The current 4.15.0-145-generic is still > affected. > > I believe that this is a regression introduced in the fixing of CVE-2019-19036. > > [Testcase] > > I haven't reliably been able to create a script which places a btrfs filesystem > into the state necessary to reproduce this issue, so I have just provided my > qcow2 image with my btrfs filesystem which reproduces the issue 100% of the time. > > Download the image from here (warning size is 8.0gb): > > https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2 > > Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to a new > virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an operating system, > it is just a data only volume. > > Mount the volume: > > $ sudo mount /dev/vdb /mnt > > Attempt to balance: > > $ sudo btrfs filesystem balance start --full-balance /mnt > Segmentation fault (core dumped) > > Check dmesg for kernel oops: > https://paste.ubuntu.com/p/wjJNqKBCfh/ > > If you install the test kernel from the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test > > You should see this instead: > > $ sudo btrfs filesystem balance start --full-balance /mnt > ERROR: error during balancing '/mnt': No space left on device > There may be more info in syslog - try dmesg | tail > > Checking dmesg shows no kernel oops, and just info about the volume being too > full to balance: > > https://paste.ubuntu.com/p/4J8Gq2dtz4/ > > [Fix] > > I found the problem to be introduced in 4.15.0-109-generic, and > 4.15.0-108-generic and earlier worked fine, which means we introduced a > regression somewhere. > > I bisected the problem down to the following commit: > > ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/4qfWCM8ykh/ > > Unfortunately, I believe this is a bad backport. If you examine the original > upstream commit: > > commit 51415b6c1b117e223bc083e30af675cb5c5498f3 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3 > > You will see the 4.15 backport has calls to free_extent_buffer() and > btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to btrfs_put_root() > in the newer patches, and contains logic to free relocated roots, so I think > we might not need the calls to free_extent_buffer() to free the extents first, > since it might be handled later. > > The core issue is that we hit a general protection fault when attempting to > access a root node, which means we have freed a root node we shouldn't have. > > If we look at the backport in 5.4.y, aka, the one in Focal: > > ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116 > Author: Qu Wenruo <wqu@suse.com> > Date: Tue May 19 10:13:20 2020 +0800 > Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference > Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/ > > It seems upstream -stable omitted the calls to btrfs_put_root() entirely, and > we don't need the calls to free_extent_buffer() because of it. > > If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-bionic, and > cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from ubuntu-focal, and > build, the problem no longer reproduces. > > [Where problems could occur] > > If a regression were to occur, it would affect users of btrfs filesystems, and > would likely show during a routine balance operation. Since the issue is > triggered during the cancellation of a balance operation, problems might occur > for users with nearly full filesystems or filesystems that have existing > corruption. > > We are replacing a patch that was backported during the fixing of CVE-2019-19036, > and replacing it with a backport provided by upstream developers, which cherry > picks from 5.4.y to Bionic. The patch in 5.4.y is well tested by the community > and is currently in the Focal kernel. > > With all modifications to btrfs, there is a risk of data corruption and > filesystem corruption for all btrfs users, since balances happen automatically > and on a regular basis. If a regression does happen, users should remount > their filesystems with the "nobalance" flag, backup their data, and attempt a > repair if necessary. > > [Other info] > > A community member has hit this issue before I did, and has reported it upstream > to linux-btrfs here, although no one knew what was happening: > > https://www.spinics.net/lists/linux-btrfs/msg103367.html > > Matthew Ruffell (1): > Revert "btrfs: reloc: fix reloc root leak and NULL pointer > dereference" > > Qu Wenruo (1): > btrfs: reloc: fix reloc root leak and NULL pointer dereference > > fs/btrfs/relocation.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > -- > 2.30.2 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team