mbox series

[SRU,B,F,0/1] btrfs/154: rename fails with EOVERFLOW when calculating item size during item key collision

Message ID 20230130021131.19564-1-matthew.ruffell@canonical.com
Headers show
Series btrfs/154: rename fails with EOVERFLOW when calculating item size during item key collision | expand

Message

Matthew Ruffell Jan. 30, 2023, 2:11 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2004132

[Impact]

xfstests btrfs/154 fails on both Bionic and Focal, leading to a kernel oops and
the btrfs volume being forced readonly.

In btrfs, item key collision is allowed for some item types, namely dir item and
inode references. When inserting items into the btree, there are two objects,
the btrfs_item and the item data. These objects must fit within the btree
nodesize.

When a hash collision occurs, and we call btrfs_search_slot() to place the
objects in the tree, when btrfs_search_slot() reaches the leaf node, a check is
performed to see if we need to split the leaf. The check is incorrect, returning
that we need to split the leaf, since it thinks that both btrfs_item and the
item data need to be inserted, when in reality, the item can be merged with the
existing one and no new btrfs_item will be inserted.

split_leaf() will return EOVERFLOW from following code:
    
  if (extend && data_size + btrfs_item_size_nr(l, slot) +
      sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
      return -EOVERFLOW;

In the rename case, btrfs_check_dir_item_collision() is called early stages of
treewalking, and correctly calculates the needed size, taking into account that
a hash collision has occurred.

  data_size = sizeof(*di) + name_len;
      if (data_size + btrfs_item_size_nr(leaf, slot) +
          sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))

The two sizes reported from btrfs_check_dir_item_collision() and
btrfs_search_slot() are different, and rename fails due to split_leaf()
returning -EOVERFLOW, leading to transaction abort and forcing the volume
readonly.

Kernel oops:

BTRFS: Transaction aborted (error -75)
WARNING: CPU: 0 PID: 2921 at /build/linux-fTmV3T/linux-4.15.0/fs/btrfs/inode.c:10217 btrfs_rename+0xcf1/0xdf0 [btrfs]
CPU: 0 PID: 2921 Comm: python3 Not tainted 4.15.0-202-generic #213-Ubuntu
RIP: 0010:btrfs_rename+0xcf1/0xdf0 [btrfs]
RSP: 0018:ffff9e6f4183fd20 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff91a493f27b98 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff91a4bfc1b4d0
RBP: ffff9e6f4183fdc0 R08: 00000000000002b4 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000236
R13: ffff91a493f56518 R14: ffff91a4b6b57b40 R15: ffff91a493f27b98
FS:  00007f6041081740(0000) GS:ffff91a4bfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6040fe84c8 CR3: 000000015c8ca005 CR4: 0000000000760ef0
PKRU: 55555554
Call Trace:
 btrfs_rename2+0x1d/0x30 [btrfs]
 vfs_rename+0x46e/0x960
 SyS_rename+0x362/0x3c0
 do_syscall_64+0x73/0x130
 entry_SYSCALL_64_after_hwframe+0x41/0xa6
Code: 0f ba a8 d0 cd 00 00 02 72 2b 41 83 f8 fb 0f 84 d9 00 00 00 44 89 c6 48 c7 c7 68 43 4b c0 44 89 55 80 44 89 45 98 e8 8f 5c a6 d0 <0f> 0b 44 8b 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98 
---[ end trace 9c6b87a19f4436f3 ]---
BTRFS: error (device vdd) in btrfs_rename:10217: errno=-75 unknown
BTRFS info (device vdd): forced readonly

[Testcase]

Start a fresh Bionic or Focal VM. 

Attach two scratch disks, I used standard virtio disks with 3gb of storage each.
These disks are /dev/vdc and /dev/vdd.

Compile xfstests:

$ sudo apt-get install acl attr automake bc dbench dump e2fsprogs fio gawk \
        gcc git indent libacl1-dev libaio-dev libcap-dev libgdbm-dev libtool \
        libtool-bin  libuuid1 lvm2 make psmisc python3 quota sed \
        uuid-dev uuid-runtime xfsprogs linux-headers-$(uname -r) sqlite3 make
$ sudo apt-get install  f2fs-tools ocfs2-tools udftools xfsdump \
        xfslibs-dev
$ git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
$ cd xfstests-dev
$ make
$ sudo su
# mkdir /test
# mkdir /scratch
# mkfs.btrfs -f /dev/vdc
# cat << EOF >> ./local.config
export TEST_DEV=/dev/vdc
export TEST_DIR=/test
export SCRATCH_DEV=/dev/vdd
export SCRATCH_MNT=/scratch
EOF

# ./check btrfs/154

btrfs/154       _check_dmesg: something found in dmesg (see /home/ubuntu/xfstests-dev/results//btrfs/154.dmesg)
- output mismatch (see /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad)
    --- tests/btrfs/154.out	2023-01-28 02:53:03.566450703 +0000
    +++ /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad	2023-01-28 06:51:34.113121042 +0000
    @@ -1,2 +1,6 @@
     QA output created by 154
    +Traceback (most recent call last):
    +  File "/home/ubuntu/xfstests-dev/src/btrfs_crc32c_forged_name.py", line 99, in <module>
    +    os.rename(srcpath, dstpath)
    +OSError: [Errno 75] Value too large for defined data type: '/scratch/309' -> b'/scratch/fa5d\x90O\x97015d7a47c48fdeb99b857c17e8038da6382fcb05e3a6b367589a8f54a8c3c1327584dfa630b4bd8c5bbb37b4decc2b82fecb4c940e0bd0989166c44e6af2855e9e42a02a57cffa2fc5283525ac53b2b0d2baaf874ff50b'
Ran: btrfs/154
Failures: btrfs/154
Failed 1 of 1 tests

If you examine dmesg, you will see the oops from the impact section.

If you install the test kernel from the below ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf349467-btrfs-154

The issue no longer occurs:

# ./check btrfs/154

Ran: btrfs/154
Passed all 1 tests

[Fix]

This was fixed in 5.11-rc3 by the below commit:

commit 9a664971569daf68254928149f580b4f5856d274
Author: ethanwu <ethanwu@synology.com>
Date:   Tue Dec 1 17:25:12 2020 +0800
Subject: btrfs: correctly calculate item size used when item key collision happens
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a664971569daf68254928149f580b4f5856d274

This cherry-picks to Focal, and required a small backport to Bionic, removing
the hunk that contained comments explaining the parameters to
btrfs_search_slot().

[Where problems could occur]

Problems could occur when calculating the size required for btrfs_item and item
data when hash collisions occur. Such collisions are rare in of itself, but
possible if you have a large amount files or craft filenames to force collisions
with the crc32 hash algorithm.

If a regression were to occur, it could cause more transactions to be aborted,
and would likely result in the users volume being forced read only. They might
not lose any existing data, but data being written might be lost.

ethanwu (1):
  btrfs: correctly calculate item size used when item key collision
    happens

 fs/btrfs/ctree.c       | 24 ++++++++++++++++++++++--
 fs/btrfs/ctree.h       |  6 ++++++
 fs/btrfs/extent-tree.c |  2 ++
 fs/btrfs/file-item.c   |  2 ++
 4 files changed, 32 insertions(+), 2 deletions(-)

Comments

Tim Gardner Jan. 30, 2023, 3:57 p.m. UTC | #1
On 1/29/23 7:11 PM, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2004132
> 
> [Impact]
> 
> xfstests btrfs/154 fails on both Bionic and Focal, leading to a kernel oops and
> the btrfs volume being forced readonly.
> 
> In btrfs, item key collision is allowed for some item types, namely dir item and
> inode references. When inserting items into the btree, there are two objects,
> the btrfs_item and the item data. These objects must fit within the btree
> nodesize.
> 
> When a hash collision occurs, and we call btrfs_search_slot() to place the
> objects in the tree, when btrfs_search_slot() reaches the leaf node, a check is
> performed to see if we need to split the leaf. The check is incorrect, returning
> that we need to split the leaf, since it thinks that both btrfs_item and the
> item data need to be inserted, when in reality, the item can be merged with the
> existing one and no new btrfs_item will be inserted.
> 
> split_leaf() will return EOVERFLOW from following code:
>      
>    if (extend && data_size + btrfs_item_size_nr(l, slot) +
>        sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>        return -EOVERFLOW;
> 
> In the rename case, btrfs_check_dir_item_collision() is called early stages of
> treewalking, and correctly calculates the needed size, taking into account that
> a hash collision has occurred.
> 
>    data_size = sizeof(*di) + name_len;
>        if (data_size + btrfs_item_size_nr(leaf, slot) +
>            sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
> 
> The two sizes reported from btrfs_check_dir_item_collision() and
> btrfs_search_slot() are different, and rename fails due to split_leaf()
> returning -EOVERFLOW, leading to transaction abort and forcing the volume
> readonly.
> 
> Kernel oops:
> 
> BTRFS: Transaction aborted (error -75)
> WARNING: CPU: 0 PID: 2921 at /build/linux-fTmV3T/linux-4.15.0/fs/btrfs/inode.c:10217 btrfs_rename+0xcf1/0xdf0 [btrfs]
> CPU: 0 PID: 2921 Comm: python3 Not tainted 4.15.0-202-generic #213-Ubuntu
> RIP: 0010:btrfs_rename+0xcf1/0xdf0 [btrfs]
> RSP: 0018:ffff9e6f4183fd20 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff91a493f27b98 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff91a4bfc1b4d0
> RBP: ffff9e6f4183fdc0 R08: 00000000000002b4 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000236
> R13: ffff91a493f56518 R14: ffff91a4b6b57b40 R15: ffff91a493f27b98
> FS:  00007f6041081740(0000) GS:ffff91a4bfc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f6040fe84c8 CR3: 000000015c8ca005 CR4: 0000000000760ef0
> PKRU: 55555554
> Call Trace:
>   btrfs_rename2+0x1d/0x30 [btrfs]
>   vfs_rename+0x46e/0x960
>   SyS_rename+0x362/0x3c0
>   do_syscall_64+0x73/0x130
>   entry_SYSCALL_64_after_hwframe+0x41/0xa6
> Code: 0f ba a8 d0 cd 00 00 02 72 2b 41 83 f8 fb 0f 84 d9 00 00 00 44 89 c6 48 c7 c7 68 43 4b c0 44 89 55 80 44 89 45 98 e8 8f 5c a6 d0 <0f> 0b 44 8b 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98
> ---[ end trace 9c6b87a19f4436f3 ]---
> BTRFS: error (device vdd) in btrfs_rename:10217: errno=-75 unknown
> BTRFS info (device vdd): forced readonly
> 
> [Testcase]
> 
> Start a fresh Bionic or Focal VM.
> 
> Attach two scratch disks, I used standard virtio disks with 3gb of storage each.
> These disks are /dev/vdc and /dev/vdd.
> 
> Compile xfstests:
> 
> $ sudo apt-get install acl attr automake bc dbench dump e2fsprogs fio gawk \
>          gcc git indent libacl1-dev libaio-dev libcap-dev libgdbm-dev libtool \
>          libtool-bin  libuuid1 lvm2 make psmisc python3 quota sed \
>          uuid-dev uuid-runtime xfsprogs linux-headers-$(uname -r) sqlite3 make
> $ sudo apt-get install  f2fs-tools ocfs2-tools udftools xfsdump \
>          xfslibs-dev
> $ git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> $ cd xfstests-dev
> $ make
> $ sudo su
> # mkdir /test
> # mkdir /scratch
> # mkfs.btrfs -f /dev/vdc
> # cat << EOF >> ./local.config
> export TEST_DEV=/dev/vdc
> export TEST_DIR=/test
> export SCRATCH_DEV=/dev/vdd
> export SCRATCH_MNT=/scratch
> EOF
> 
> # ./check btrfs/154
> 
> btrfs/154       _check_dmesg: something found in dmesg (see /home/ubuntu/xfstests-dev/results//btrfs/154.dmesg)
> - output mismatch (see /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad)
>      --- tests/btrfs/154.out	2023-01-28 02:53:03.566450703 +0000
>      +++ /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad	2023-01-28 06:51:34.113121042 +0000
>      @@ -1,2 +1,6 @@
>       QA output created by 154
>      +Traceback (most recent call last):
>      +  File "/home/ubuntu/xfstests-dev/src/btrfs_crc32c_forged_name.py", line 99, in <module>
>      +    os.rename(srcpath, dstpath)
>      +OSError: [Errno 75] Value too large for defined data type: '/scratch/309' -> b'/scratch/fa5d\x90O\x97015d7a47c48fdeb99b857c17e8038da6382fcb05e3a6b367589a8f54a8c3c1327584dfa630b4bd8c5bbb37b4decc2b82fecb4c940e0bd0989166c44e6af2855e9e42a02a57cffa2fc5283525ac53b2b0d2baaf874ff50b'
> Ran: btrfs/154
> Failures: btrfs/154
> Failed 1 of 1 tests
> 
> If you examine dmesg, you will see the oops from the impact section.
> 
> If you install the test kernel from the below ppa:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf349467-btrfs-154
> 
> The issue no longer occurs:
> 
> # ./check btrfs/154
> 
> Ran: btrfs/154
> Passed all 1 tests
> 
> [Fix]
> 
> This was fixed in 5.11-rc3 by the below commit:
> 
> commit 9a664971569daf68254928149f580b4f5856d274
> Author: ethanwu <ethanwu@synology.com>
> Date:   Tue Dec 1 17:25:12 2020 +0800
> Subject: btrfs: correctly calculate item size used when item key collision happens
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a664971569daf68254928149f580b4f5856d274
> 
> This cherry-picks to Focal, and required a small backport to Bionic, removing
> the hunk that contained comments explaining the parameters to
> btrfs_search_slot().
> 
> [Where problems could occur]
> 
> Problems could occur when calculating the size required for btrfs_item and item
> data when hash collisions occur. Such collisions are rare in of itself, but
> possible if you have a large amount files or craft filenames to force collisions
> with the crc32 hash algorithm.
> 
> If a regression were to occur, it could cause more transactions to be aborted,
> and would likely result in the users volume being forced read only. They might
> not lose any existing data, but data being written might be lost.
> 
> ethanwu (1):
>    btrfs: correctly calculate item size used when item key collision
>      happens
> 
>   fs/btrfs/ctree.c       | 24 ++++++++++++++++++++++--
>   fs/btrfs/ctree.h       |  6 ++++++
>   fs/btrfs/extent-tree.c |  2 ++
>   fs/btrfs/file-item.c   |  2 ++
>   4 files changed, 32 insertions(+), 2 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Luke Nowakowski-Krijger Feb. 23, 2023, 8:18 p.m. UTC | #2
Acked-by: Luke Nowakowski-Krijger <luke.nowakowskikrijger@canonical.com>

On Sun, Jan 29, 2023 at 6:12 PM Matthew Ruffell <
matthew.ruffell@canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/2004132
>
> [Impact]
>
> xfstests btrfs/154 fails on both Bionic and Focal, leading to a kernel
> oops and
> the btrfs volume being forced readonly.
>
> In btrfs, item key collision is allowed for some item types, namely dir
> item and
> inode references. When inserting items into the btree, there are two
> objects,
> the btrfs_item and the item data. These objects must fit within the btree
> nodesize.
>
> When a hash collision occurs, and we call btrfs_search_slot() to place the
> objects in the tree, when btrfs_search_slot() reaches the leaf node, a
> check is
> performed to see if we need to split the leaf. The check is incorrect,
> returning
> that we need to split the leaf, since it thinks that both btrfs_item and
> the
> item data need to be inserted, when in reality, the item can be merged
> with the
> existing one and no new btrfs_item will be inserted.
>
> split_leaf() will return EOVERFLOW from following code:
>
>   if (extend && data_size + btrfs_item_size_nr(l, slot) +
>       sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>       return -EOVERFLOW;
>
> In the rename case, btrfs_check_dir_item_collision() is called early
> stages of
> treewalking, and correctly calculates the needed size, taking into account
> that
> a hash collision has occurred.
>
>   data_size = sizeof(*di) + name_len;
>       if (data_size + btrfs_item_size_nr(leaf, slot) +
>           sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
>
> The two sizes reported from btrfs_check_dir_item_collision() and
> btrfs_search_slot() are different, and rename fails due to split_leaf()
> returning -EOVERFLOW, leading to transaction abort and forcing the volume
> readonly.
>
> Kernel oops:
>
> BTRFS: Transaction aborted (error -75)
> WARNING: CPU: 0 PID: 2921 at
> /build/linux-fTmV3T/linux-4.15.0/fs/btrfs/inode.c:10217
> btrfs_rename+0xcf1/0xdf0 [btrfs]
> CPU: 0 PID: 2921 Comm: python3 Not tainted 4.15.0-202-generic #213-Ubuntu
> RIP: 0010:btrfs_rename+0xcf1/0xdf0 [btrfs]
> RSP: 0018:ffff9e6f4183fd20 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff91a493f27b98 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff91a4bfc1b4d0
> RBP: ffff9e6f4183fdc0 R08: 00000000000002b4 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000236
> R13: ffff91a493f56518 R14: ffff91a4b6b57b40 R15: ffff91a493f27b98
> FS:  00007f6041081740(0000) GS:ffff91a4bfc00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f6040fe84c8 CR3: 000000015c8ca005 CR4: 0000000000760ef0
> PKRU: 55555554
> Call Trace:
>  btrfs_rename2+0x1d/0x30 [btrfs]
>  vfs_rename+0x46e/0x960
>  SyS_rename+0x362/0x3c0
>  do_syscall_64+0x73/0x130
>  entry_SYSCALL_64_after_hwframe+0x41/0xa6
> Code: 0f ba a8 d0 cd 00 00 02 72 2b 41 83 f8 fb 0f 84 d9 00 00 00 44 89 c6
> 48 c7 c7 68 43 4b c0 44 89 55 80 44 89 45 98 e8 8f 5c a6 d0 <0f> 0b 44 8b
> 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98
> ---[ end trace 9c6b87a19f4436f3 ]---
> BTRFS: error (device vdd) in btrfs_rename:10217: errno=-75 unknown
> BTRFS info (device vdd): forced readonly
>
> [Testcase]
>
> Start a fresh Bionic or Focal VM.
>
> Attach two scratch disks, I used standard virtio disks with 3gb of storage
> each.
> These disks are /dev/vdc and /dev/vdd.
>
> Compile xfstests:
>
> $ sudo apt-get install acl attr automake bc dbench dump e2fsprogs fio gawk
> \
>         gcc git indent libacl1-dev libaio-dev libcap-dev libgdbm-dev
> libtool \
>         libtool-bin  libuuid1 lvm2 make psmisc python3 quota sed \
>         uuid-dev uuid-runtime xfsprogs linux-headers-$(uname -r) sqlite3
> make
> $ sudo apt-get install  f2fs-tools ocfs2-tools udftools xfsdump \
>         xfslibs-dev
> $ git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> $ cd xfstests-dev
> $ make
> $ sudo su
> # mkdir /test
> # mkdir /scratch
> # mkfs.btrfs -f /dev/vdc
> # cat << EOF >> ./local.config
> export TEST_DEV=/dev/vdc
> export TEST_DIR=/test
> export SCRATCH_DEV=/dev/vdd
> export SCRATCH_MNT=/scratch
> EOF
>
> # ./check btrfs/154
>
> btrfs/154       _check_dmesg: something found in dmesg (see
> /home/ubuntu/xfstests-dev/results//btrfs/154.dmesg)
> - output mismatch (see
> /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad)
>     --- tests/btrfs/154.out     2023-01-28 02:53:03.566450703 +0000
>     +++ /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad    2023-01-28
> 06:51:34.113121042 +0000
>     @@ -1,2 +1,6 @@
>      QA output created by 154
>     +Traceback (most recent call last):
>     +  File "/home/ubuntu/xfstests-dev/src/btrfs_crc32c_forged_name.py",
> line 99, in <module>
>     +    os.rename(srcpath, dstpath)
>     +OSError: [Errno 75] Value too large for defined data type:
> '/scratch/309' ->
> b'/scratch/fa5d\x90O\x97015d7a47c48fdeb99b857c17e8038da6382fcb05e3a6b367589a8f54a8c3c1327584dfa630b4bd8c5bbb37b4decc2b82fecb4c940e0bd0989166c44e6af2855e9e42a02a57cffa2fc5283525ac53b2b0d2baaf874ff50b'
> Ran: btrfs/154
> Failures: btrfs/154
> Failed 1 of 1 tests
>
> If you examine dmesg, you will see the oops from the impact section.
>
> If you install the test kernel from the below ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf349467-btrfs-154
>
> The issue no longer occurs:
>
> # ./check btrfs/154
>
> Ran: btrfs/154
> Passed all 1 tests
>
> [Fix]
>
> This was fixed in 5.11-rc3 by the below commit:
>
> commit 9a664971569daf68254928149f580b4f5856d274
> Author: ethanwu <ethanwu@synology.com>
> Date:   Tue Dec 1 17:25:12 2020 +0800
> Subject: btrfs: correctly calculate item size used when item key collision
> happens
> Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a664971569daf68254928149f580b4f5856d274
>
> This cherry-picks to Focal, and required a small backport to Bionic,
> removing
> the hunk that contained comments explaining the parameters to
> btrfs_search_slot().
>
> [Where problems could occur]
>
> Problems could occur when calculating the size required for btrfs_item and
> item
> data when hash collisions occur. Such collisions are rare in of itself, but
> possible if you have a large amount files or craft filenames to force
> collisions
> with the crc32 hash algorithm.
>
> If a regression were to occur, it could cause more transactions to be
> aborted,
> and would likely result in the users volume being forced read only. They
> might
> not lose any existing data, but data being written might be lost.
>
> ethanwu (1):
>   btrfs: correctly calculate item size used when item key collision
>     happens
>
>  fs/btrfs/ctree.c       | 24 ++++++++++++++++++++++--
>  fs/btrfs/ctree.h       |  6 ++++++
>  fs/btrfs/extent-tree.c |  2 ++
>  fs/btrfs/file-item.c   |  2 ++
>  4 files changed, 32 insertions(+), 2 deletions(-)
>
> --
> 2.37.2
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
Luke Nowakowski-Krijger Feb. 23, 2023, 8:26 p.m. UTC | #3
Applied to bionic and focal linux master-next

Thanks!
- Luke

On Sun, Jan 29, 2023 at 6:12 PM Matthew Ruffell <
matthew.ruffell@canonical.com> wrote:

> BugLink: https://bugs.launchpad.net/bugs/2004132
>
> [Impact]
>
> xfstests btrfs/154 fails on both Bionic and Focal, leading to a kernel
> oops and
> the btrfs volume being forced readonly.
>
> In btrfs, item key collision is allowed for some item types, namely dir
> item and
> inode references. When inserting items into the btree, there are two
> objects,
> the btrfs_item and the item data. These objects must fit within the btree
> nodesize.
>
> When a hash collision occurs, and we call btrfs_search_slot() to place the
> objects in the tree, when btrfs_search_slot() reaches the leaf node, a
> check is
> performed to see if we need to split the leaf. The check is incorrect,
> returning
> that we need to split the leaf, since it thinks that both btrfs_item and
> the
> item data need to be inserted, when in reality, the item can be merged
> with the
> existing one and no new btrfs_item will be inserted.
>
> split_leaf() will return EOVERFLOW from following code:
>
>   if (extend && data_size + btrfs_item_size_nr(l, slot) +
>       sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>       return -EOVERFLOW;
>
> In the rename case, btrfs_check_dir_item_collision() is called early
> stages of
> treewalking, and correctly calculates the needed size, taking into account
> that
> a hash collision has occurred.
>
>   data_size = sizeof(*di) + name_len;
>       if (data_size + btrfs_item_size_nr(leaf, slot) +
>           sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
>
> The two sizes reported from btrfs_check_dir_item_collision() and
> btrfs_search_slot() are different, and rename fails due to split_leaf()
> returning -EOVERFLOW, leading to transaction abort and forcing the volume
> readonly.
>
> Kernel oops:
>
> BTRFS: Transaction aborted (error -75)
> WARNING: CPU: 0 PID: 2921 at
> /build/linux-fTmV3T/linux-4.15.0/fs/btrfs/inode.c:10217
> btrfs_rename+0xcf1/0xdf0 [btrfs]
> CPU: 0 PID: 2921 Comm: python3 Not tainted 4.15.0-202-generic #213-Ubuntu
> RIP: 0010:btrfs_rename+0xcf1/0xdf0 [btrfs]
> RSP: 0018:ffff9e6f4183fd20 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff91a493f27b98 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff91a4bfc1b4d0
> RBP: ffff9e6f4183fdc0 R08: 00000000000002b4 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000236
> R13: ffff91a493f56518 R14: ffff91a4b6b57b40 R15: ffff91a493f27b98
> FS:  00007f6041081740(0000) GS:ffff91a4bfc00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f6040fe84c8 CR3: 000000015c8ca005 CR4: 0000000000760ef0
> PKRU: 55555554
> Call Trace:
>  btrfs_rename2+0x1d/0x30 [btrfs]
>  vfs_rename+0x46e/0x960
>  SyS_rename+0x362/0x3c0
>  do_syscall_64+0x73/0x130
>  entry_SYSCALL_64_after_hwframe+0x41/0xa6
> Code: 0f ba a8 d0 cd 00 00 02 72 2b 41 83 f8 fb 0f 84 d9 00 00 00 44 89 c6
> 48 c7 c7 68 43 4b c0 44 89 55 80 44 89 45 98 e8 8f 5c a6 d0 <0f> 0b 44 8b
> 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98
> ---[ end trace 9c6b87a19f4436f3 ]---
> BTRFS: error (device vdd) in btrfs_rename:10217: errno=-75 unknown
> BTRFS info (device vdd): forced readonly
>
> [Testcase]
>
> Start a fresh Bionic or Focal VM.
>
> Attach two scratch disks, I used standard virtio disks with 3gb of storage
> each.
> These disks are /dev/vdc and /dev/vdd.
>
> Compile xfstests:
>
> $ sudo apt-get install acl attr automake bc dbench dump e2fsprogs fio gawk
> \
>         gcc git indent libacl1-dev libaio-dev libcap-dev libgdbm-dev
> libtool \
>         libtool-bin  libuuid1 lvm2 make psmisc python3 quota sed \
>         uuid-dev uuid-runtime xfsprogs linux-headers-$(uname -r) sqlite3
> make
> $ sudo apt-get install  f2fs-tools ocfs2-tools udftools xfsdump \
>         xfslibs-dev
> $ git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> $ cd xfstests-dev
> $ make
> $ sudo su
> # mkdir /test
> # mkdir /scratch
> # mkfs.btrfs -f /dev/vdc
> # cat << EOF >> ./local.config
> export TEST_DEV=/dev/vdc
> export TEST_DIR=/test
> export SCRATCH_DEV=/dev/vdd
> export SCRATCH_MNT=/scratch
> EOF
>
> # ./check btrfs/154
>
> btrfs/154       _check_dmesg: something found in dmesg (see
> /home/ubuntu/xfstests-dev/results//btrfs/154.dmesg)
> - output mismatch (see
> /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad)
>     --- tests/btrfs/154.out     2023-01-28 02:53:03.566450703 +0000
>     +++ /home/ubuntu/xfstests-dev/results//btrfs/154.out.bad    2023-01-28
> 06:51:34.113121042 +0000
>     @@ -1,2 +1,6 @@
>      QA output created by 154
>     +Traceback (most recent call last):
>     +  File "/home/ubuntu/xfstests-dev/src/btrfs_crc32c_forged_name.py",
> line 99, in <module>
>     +    os.rename(srcpath, dstpath)
>     +OSError: [Errno 75] Value too large for defined data type:
> '/scratch/309' ->
> b'/scratch/fa5d\x90O\x97015d7a47c48fdeb99b857c17e8038da6382fcb05e3a6b367589a8f54a8c3c1327584dfa630b4bd8c5bbb37b4decc2b82fecb4c940e0bd0989166c44e6af2855e9e42a02a57cffa2fc5283525ac53b2b0d2baaf874ff50b'
> Ran: btrfs/154
> Failures: btrfs/154
> Failed 1 of 1 tests
>
> If you examine dmesg, you will see the oops from the impact section.
>
> If you install the test kernel from the below ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf349467-btrfs-154
>
> The issue no longer occurs:
>
> # ./check btrfs/154
>
> Ran: btrfs/154
> Passed all 1 tests
>
> [Fix]
>
> This was fixed in 5.11-rc3 by the below commit:
>
> commit 9a664971569daf68254928149f580b4f5856d274
> Author: ethanwu <ethanwu@synology.com>
> Date:   Tue Dec 1 17:25:12 2020 +0800
> Subject: btrfs: correctly calculate item size used when item key collision
> happens
> Link:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a664971569daf68254928149f580b4f5856d274
>
> This cherry-picks to Focal, and required a small backport to Bionic,
> removing
> the hunk that contained comments explaining the parameters to
> btrfs_search_slot().
>
> [Where problems could occur]
>
> Problems could occur when calculating the size required for btrfs_item and
> item
> data when hash collisions occur. Such collisions are rare in of itself, but
> possible if you have a large amount files or craft filenames to force
> collisions
> with the crc32 hash algorithm.
>
> If a regression were to occur, it could cause more transactions to be
> aborted,
> and would likely result in the users volume being forced read only. They
> might
> not lose any existing data, but data being written might be lost.
>
> ethanwu (1):
>   btrfs: correctly calculate item size used when item key collision
>     happens
>
>  fs/btrfs/ctree.c       | 24 ++++++++++++++++++++++--
>  fs/btrfs/ctree.h       |  6 ++++++
>  fs/btrfs/extent-tree.c |  2 ++
>  fs/btrfs/file-item.c   |  2 ++
>  4 files changed, 32 insertions(+), 2 deletions(-)
>
> --
> 2.37.2
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>