diff mbox series

fs/btrfs: handle data extents, which crosss stripe boundaries, correctly

Message ID 57ad584676de5b72ae422a22dc36922285405291.1672362424.git.wqu@suse.com
State Accepted
Delegated to: Tom Rini
Headers show
Series fs/btrfs: handle data extents, which crosss stripe boundaries, correctly | expand

Commit Message

Qu Wenruo Dec. 30, 2022, 1:07 a.m. UTC
[BUG]
Since btrfs supports single device RAID0 at mkfs time after btrfs-progs
v5.14, if we create a single device raid0 btrfs, and created a file
crossing stripe boundary:

  # mkfs.btrfs -m dup -d raid0 test.img
  # mount test.img mnt
  # xfs_io -f -c "pwrite 0 128K" mnt/file
  # umount mnt

Since btrfs is using 64K as stripe length, above 128K data write is
definitely going to cross at least one stripe boundary.

Then u-boot would fail to read above 128K file:

 => host bind 0 /home/adam/test.img
 => ls host 0
 <   >     131072  Fri Dec 30 00:18:25 2022  file
 => load host 0 0 file
 BTRFS: An error occurred while reading file file
 Failed to load 'file'

[CAUSE]
Unlike tree blocks read, data extent reads doesn't consider cases in which
one data extent can cross stripe boundary.

In read_data_extent(), we just call btrfs_map_block() once and read the
first mapped range.

And if the first mapped range is smaller than the desired range, it
would return error.

But since even single device btrfs can utilize RAID0 profiles, the first
mapped range can only be at most 64K for RAID0 profiles, and cause false
error.

[FIX]
Just like read_whole_eb(), we should call btrfs_map_block() in a loop
until we read all data.

Since we're here, also add extra error messages for the following cases:

- btrfs_map_block() failure
  We already have the error message for it.

- Missing device
  This should not happen, as we only support single device for now.

- __btrfs_devread() failure

With this bug fixed, btrfs driver of u-boot can properly read the above
128K file, and have the correct content:

 => host bind 0 /home/adam/test.img
 => ls host 0
 <   >     131072  Fri Dec 30 00:18:25 2022  file
 => load host 0 0 file
 131072 bytes read in 0 ms
 => md5sum 0 0x20000
 md5 for 00000000 ... 0001ffff ==> d48858312a922db7eb86377f638dbc9f
 ^^^ Above md5sum also matches.

Reported-by: Sam Winchenbach <swichenbach@tethers.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 49 +++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

Comments

Tom Rini Jan. 12, 2023, 3:19 p.m. UTC | #1
On Fri, Dec 30, 2022 at 09:07:05AM +0800, Qu Wenruo wrote:

> [BUG]
> Since btrfs supports single device RAID0 at mkfs time after btrfs-progs
> v5.14, if we create a single device raid0 btrfs, and created a file
> crossing stripe boundary:
> 
>   # mkfs.btrfs -m dup -d raid0 test.img
>   # mount test.img mnt
>   # xfs_io -f -c "pwrite 0 128K" mnt/file
>   # umount mnt
> 
> Since btrfs is using 64K as stripe length, above 128K data write is
> definitely going to cross at least one stripe boundary.
> 
> Then u-boot would fail to read above 128K file:
> 
>  => host bind 0 /home/adam/test.img
>  => ls host 0
>  <   >     131072  Fri Dec 30 00:18:25 2022  file
>  => load host 0 0 file
>  BTRFS: An error occurred while reading file file
>  Failed to load 'file'
> 
> [CAUSE]
> Unlike tree blocks read, data extent reads doesn't consider cases in which
> one data extent can cross stripe boundary.
> 
> In read_data_extent(), we just call btrfs_map_block() once and read the
> first mapped range.
> 
> And if the first mapped range is smaller than the desired range, it
> would return error.
> 
> But since even single device btrfs can utilize RAID0 profiles, the first
> mapped range can only be at most 64K for RAID0 profiles, and cause false
> error.
> 
> [FIX]
> Just like read_whole_eb(), we should call btrfs_map_block() in a loop
> until we read all data.
> 
> Since we're here, also add extra error messages for the following cases:
> 
> - btrfs_map_block() failure
>   We already have the error message for it.
> 
> - Missing device
>   This should not happen, as we only support single device for now.
> 
> - __btrfs_devread() failure
> 
> With this bug fixed, btrfs driver of u-boot can properly read the above
> 128K file, and have the correct content:
> 
>  => host bind 0 /home/adam/test.img
>  => ls host 0
>  <   >     131072  Fri Dec 30 00:18:25 2022  file
>  => load host 0 0 file
>  131072 bytes read in 0 ms
>  => md5sum 0 0x20000
>  md5 for 00000000 ... 0001ffff ==> d48858312a922db7eb86377f638dbc9f
>  ^^^ Above md5sum also matches.
> 
> Reported-by: Sam Winchenbach <swichenbach@tethers.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Applied to u-boot/master, thanks!
Andreas Schwab Feb. 12, 2023, 3:37 p.m. UTC | #2
On Dez 30 2022, Qu Wenruo wrote:

> [BUG]
> Since btrfs supports single device RAID0 at mkfs time after btrfs-progs
> v5.14, if we create a single device raid0 btrfs, and created a file
> crossing stripe boundary:
>
>   # mkfs.btrfs -m dup -d raid0 test.img
>   # mount test.img mnt
>   # xfs_io -f -c "pwrite 0 128K" mnt/file
>   # umount mnt
>
> Since btrfs is using 64K as stripe length, above 128K data write is
> definitely going to cross at least one stripe boundary.
>
> Then u-boot would fail to read above 128K file:
>
>  => host bind 0 /home/adam/test.img
>  => ls host 0
>  <   >     131072  Fri Dec 30 00:18:25 2022  file
>  => load host 0 0 file
>  BTRFS: An error occurred while reading file file
>  Failed to load 'file'
>
> [CAUSE]
> Unlike tree blocks read, data extent reads doesn't consider cases in which
> one data extent can cross stripe boundary.
>
> In read_data_extent(), we just call btrfs_map_block() once and read the
> first mapped range.
>
> And if the first mapped range is smaller than the desired range, it
> would return error.
>
> But since even single device btrfs can utilize RAID0 profiles, the first
> mapped range can only be at most 64K for RAID0 profiles, and cause false
> error.
>
> [FIX]
> Just like read_whole_eb(), we should call btrfs_map_block() in a loop
> until we read all data.
>
> Since we're here, also add extra error messages for the following cases:
>
> - btrfs_map_block() failure
>   We already have the error message for it.
>
> - Missing device
>   This should not happen, as we only support single device for now.
>
> - __btrfs_devread() failure
>
> With this bug fixed, btrfs driver of u-boot can properly read the above
> 128K file, and have the correct content:
>
>  => host bind 0 /home/adam/test.img
>  => ls host 0
>  <   >     131072  Fri Dec 30 00:18:25 2022  file
>  => load host 0 0 file
>  131072 bytes read in 0 ms
>  => md5sum 0 0x20000
>  md5 for 00000000 ... 0001ffff ==> d48858312a922db7eb86377f638dbc9f
>  ^^^ Above md5sum also matches.
>
> Reported-by: Sam Winchenbach <swichenbach@tethers.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This breaks btrfs on the HiFive Unmatched.

=> pci enum
PCIE-0: Link up (Gen1-x8, Bus0)
=> nvme scan
=> load nvme 0:2 0x8c000000 /boot/dtb/sifive/hifive-unmatched-a00.dtb
[hangs]
Andreas Schwab Feb. 12, 2023, 4:01 p.m. UTC | #3
The length returned by btrfs_map_block is clearly bogus:

read_extent_data: cur=615817216, orig_len=16384, cur_len=16384
read_extent_data: btrfs_map_block: cur_len=479944704; ret=0
read_extent_data: ret=0
read_extent_data: cur=615833600, orig_len=4096, cur_len=4096
read_extent_data: btrfs_map_block: cur_len=479928320; ret=0
Andreas Schwab Feb. 12, 2023, 4:20 p.m. UTC | #4
When I print ce->size in __btrfs_map_block, it is almost always
1073741824, which looks bogus.
Qu Wenruo Feb. 13, 2023, 12:19 a.m. UTC | #5
On 2023/2/13 00:20, Andreas Schwab wrote:
> When I print ce->size in __btrfs_map_block, it is almost always
> 1073741824, which looks bogus.
> 
Can you provide the image of that filesystem?

Thanks,
Qu
Andreas Schwab Feb. 13, 2023, 6:25 p.m. UTC | #6
On Feb 13 2023, Qu Wenruo wrote:

> On 2023/2/13 00:20, Andreas Schwab wrote:
>> When I print ce->size in __btrfs_map_block, it is almost always
>> 1073741824, which looks bogus.
>> 
> Can you provide the image of that filesystem?

How do I do that?
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f0d9f1c113b..7eaa7e949604 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -541,34 +541,39 @@  struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical,
 		     u64 *len, int mirror)
 {
-	u64 offset = 0;
+	u64 orig_len = *len;
+	u64 cur = logical;
 	struct btrfs_multi_bio *multi = NULL;
 	struct btrfs_device *device;
 	int ret = 0;
-	u64 max_len = *len;
 
-	ret = btrfs_map_block(fs_info, READ, logical, len, &multi, mirror,
-			      NULL);
-	if (ret) {
-		fprintf(stderr, "Couldn't map the block %llu\n",
-				logical + offset);
-		goto err;
-	}
-	device = multi->stripes[0].dev;
+	while (cur < logical + orig_len) {
+		u64 cur_len = logical + orig_len - cur;
 
-	if (*len > max_len)
-		*len = max_len;
-	if (!device->desc || !device->part) {
-		ret = -EIO;
-		goto err;
-	}
-
-	ret = __btrfs_devread(device->desc, device->part, data, *len,
-			      multi->stripes[0].physical);
-	if (ret != *len)
-		ret = -EIO;
-	else
+		ret = btrfs_map_block(fs_info, READ, cur, &cur_len, &multi,
+				      mirror, NULL);
+		if (ret) {
+			error("Couldn't map the block %llu", cur);
+			goto err;
+		}
+		device = multi->stripes[0].dev;
+		if (!device->desc || !device->part) {
+			error("devid %llu is missing", device->devid);
+			ret = -EIO;
+			goto err;
+		}
+		ret = __btrfs_devread(device->desc, device->part,
+				data + (cur - logical), cur_len,
+				multi->stripes[0].physical);
+		if (ret != cur_len) {
+			error("read failed on devid %llu physical %llu",
+			      device->devid, multi->stripes[0].physical);
+			ret = -EIO;
+			goto err;
+		}
+		cur += cur_len;
 		ret = 0;
+	}
 err:
 	kfree(multi);
 	return ret;