Message ID | 20200418233231.z767yvfiupy7hwgp@xzhoux.usersys.redhat.com |
---|---|
State | New |
Headers | show |
Series | ext4: validate fiemap iomap begin offset and length value | expand |
++ mailing list. Sorry somehow it got dropped. On 4/19/20 7:21 AM, Ritesh Harjani wrote: > Hello Murphy, > > I guess the patch to fix this issue was recently submitted. > Could you please test your reproducer, xfstest and ltp > tests on below patch too. And let me know if we can add your Tested-by: > > https://patchwork.ozlabs.org/project/linux-ext4/patch/1a2dc8f198e1225ddd40833de76b60c7ee20d22d.1587024137.git.riteshh@linux.ibm.com/ > > > -ritesh > > On 4/19/20 5:02 AM, Murphy Zhou wrote: >> Sometimes crazy userspace values can be here causing overflow issue. >> >> After moved ext4_fiemap to using the iomap framework in >> commit d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") >> we can hit the WARN_ON at fs/iomap/apply.c:51, then get an EIO error >> running xfstests generic/009 (and some others) on ext4 based overlayfs. >> >> The minimal reproducer is: >> ------------------------------------- >> fallocate -l 256M test.img >> mkfs.ext4 -Fq -b 4096 -I 256 test.img >> mkdir -p test >> mount -o loop test.img test || exit >> pushd test >> rm -rf l u w m >> mkdir -p l u w m >> mount -t overlay -o lowerdir=l,upperdir=u,workdir=w overlay m || exit >> xfs_io -f -c "pwrite 0 4096" -c "fiemap" m/tf >> umount m >> rm -rf l u w m >> popd >> umount -d test >> rm -rf test test.img >> ------------------------------------- >> >> Because we run fiemap command wo/ the offset and length parameters, >> xfs_io set values based on fs blocksize etc which is got from >> the mounted fs. These values xfs_io passed are way larger on overlayfs >> than ext4 directly. So we can't reproduce this directly on ext4 or xfs. >> I tried to call ioctl directly with large length value but failed to >> reproduce this. >> >> I did not try to get what values xfs_io exactly passing in, but I >> confirmed that overflowed value when it made into _ext4_fiemap. >> It's a length of 0x7fffffffffffffff which will mess up the calculation >> of map.m_lblk and map.m_len, make map.m_len to be 0, then hit WARN_ON >> and get EIO in iomap_apply. >> >> Fixing this by ensuring the offset and length values wont exceed >> EXT4_MAX_LOGICAL_BLOCK. Also make sure that the length would not >> be zero because of crazy overflowed values. >> >> This patch has been tested with LTP/xfstests showing no new issue. >> >> Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> >> Fixes: d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") >> --- >> fs/ext4/inode.c | 17 ++++++++++++++--- >> 1 file changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index e416096..3620417 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -3523,6 +3523,8 @@ static int ext4_iomap_begin_report(struct inode >> *inode, loff_t offset, >> int ret; >> bool delalloc = false; >> struct ext4_map_blocks map; >> + ext4_lblk_t last_lblk; >> + ext4_lblk_t lblk; >> u8 blkbits = inode->i_blkbits; >> if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >> @@ -3540,9 +3542,18 @@ static int ext4_iomap_begin_report(struct inode >> *inode, loff_t offset, >> /* >> * Calculate the first and last logical block respectively. >> */ >> - map.m_lblk = offset >> blkbits; >> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >> + lblk = offset >> blkbits; >> + last_lblk = (offset + length - 1) >> blkbits; >> + >> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >> + >> + map.m_lblk = lblk; >> + map.m_len = last_lblk - lblk + 1; >> + if (map.m_len == 0 ) >> + map.m_len = 1; >> /* >> * Fiemap callers may call for offset beyond s_bitmap_maxbytes. >>
On Sun, Apr 19, 2020 at 07:26:53AM +0530, Ritesh Harjani wrote: > ++ mailing list. > Sorry somehow it got dropped. > > > On 4/19/20 7:21 AM, Ritesh Harjani wrote: > > Hello Murphy, > > > > I guess the patch to fix this issue was recently submitted. > > Could you please test your reproducer, xfstest and ltp > > tests on below patch too. And let me know if we can add your Tested-by: > > > > https://patchwork.ozlabs.org/project/linux-ext4/patch/1a2dc8f198e1225ddd40833de76b60c7ee20d22d.1587024137.git.riteshh@linux.ibm.com/ His reproducer is still failing with your patch. In order to for his reproducer to succeed, we need to constrain lblk and last_lblk more strictly than what is done in: [PATCHv2 1/1] ext4: fix overflow case for map.m_len in ext4_iomap_begin_* His patch does fix the issue. ext4_map_block() is returning EFSCORRUPTED when lblk is EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, but it looks we have some overflow/wraparound issue when lblk is 0xFFFFFFFF. Which might mean that in fact EXT4_MAX_LOGICAL_BLOCK might need to be 0xFFFFFFFE, or we need to look very closely our code paths to make sure the right thing happes when we call ext4_map_blocks() with m_lblk == 0xFFFFFFFF and m_len == 1. I think we need to take his patch, and make a simialr change to ext4_iomap_begin(). Ritesh, do you agree? - Ted
On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: > I think we need to take his patch, and make a simialr change to > ext4_iomap_begin(). Ritesh, do you agree? For example... diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 2a4aae6acdcb..adce3339d697 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, int ret; struct ext4_map_blocks map; u8 blkbits = inode->i_blkbits; + ext4_lblk_t lblk = offset >> blkbits; + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + if (lblk > EXT4_MAX_LOGICAL_BLOCK) return -EINVAL; if (WARN_ON_ONCE(ext4_has_inline_data(inode))) @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, /* * Calculate the first and last logical blocks respectively. */ - map.m_lblk = offset >> blkbits; - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + + map.m_lblk = lblk; + map.m_len = last_lblk - lblk + 1; + if (map.m_len == 0 ) + map.m_len = 1; if (flags & IOMAP_WRITE) ret = ext4_iomap_alloc(inode, &map, flags); @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, bool delalloc = false; struct ext4_map_blocks map; u8 blkbits = inode->i_blkbits; + ext4_lblk_t lblk = offset >> blkbits; + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + if (lblk > EXT4_MAX_LOGICAL_BLOCK) return -EINVAL; if (ext4_has_inline_data(inode)) { @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, /* * Calculate the first and last logical block respectively. */ - map.m_lblk = offset >> blkbits; - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + + map.m_lblk = lblk; + map.m_len = last_lblk - lblk + 1; + if (map.m_len == 0 ) + map.m_len = 1; /* * Fiemap callers may call for offset beyond s_bitmap_maxbytes.
Hello Ted, On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: > ext4_map_block() is returning EFSCORRUPTED when lblk is > EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to > EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, Yes, I did mention about this case in point 2 in below link though. But maybe I was only focused on testing syzcaller reproducer, so couldn't test this reported case. https://www.spinics.net/lists/linux-ext4/msg71387.html > On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: >> I think we need to take his patch, and make a simialr change to >> ext4_iomap_begin(). Ritesh, do you agree? > > For example... > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 2a4aae6acdcb..adce3339d697 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > int ret; > struct ext4_map_blocks map; > u8 blkbits = inode->i_blkbits; > + ext4_lblk_t lblk = offset >> blkbits; > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; Why play with last_lblk but? > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > return -EINVAL; > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) > @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > /* > * Calculate the first and last logical blocks respectively. > */ > - map.m_lblk = offset >> blkbits; > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > + > + map.m_lblk = lblk; > + map.m_len = last_lblk - lblk + 1; > + if (map.m_len == 0 ) > + map.m_len = 1; Not sure but with above changes map.m_len will never be 0. Right? Ok, so the problem mainly is coming since ext4_map_blocks() is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. So why change last_lblk? Shouldn't we just change the logic to return -ENOENT in case if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by IOMAP APIs to abort the loop properly. This along with the map.m_len overlflow patch which I had submitted before. (since the overflow patch is anyway a valid fix which we anyways need). -ritesh > > if (flags & IOMAP_WRITE) > ret = ext4_iomap_alloc(inode, &map, flags); > @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > bool delalloc = false; > struct ext4_map_blocks map; > u8 blkbits = inode->i_blkbits; > + ext4_lblk_t lblk = offset >> blkbits; > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > return -EINVAL; > > if (ext4_has_inline_data(inode)) { > @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > /* > * Calculate the first and last logical block respectively. > */ > - map.m_lblk = offset >> blkbits; > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > + > + map.m_lblk = lblk; > + map.m_len = last_lblk - lblk + 1; > + if (map.m_len == 0 ) > + map.m_len = 1; > > /* > * Fiemap callers may call for offset beyond s_bitmap_maxbytes. >
On Sun, Apr 19, 2020 at 09:49:27PM +0530, Ritesh Harjani wrote: > Hello Ted, > > On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: > > > ext4_map_block() is returning EFSCORRUPTED when lblk is > > EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to > > EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, > > Yes, I did mention about this case in point 2 in below link though. > But maybe I was only focused on testing syzcaller reproducer, so > couldn't test this reported case. > > https://www.spinics.net/lists/linux-ext4/msg71387.html > > > > On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: > > > I think we need to take his patch, and make a simialr change to > > > ext4_iomap_begin(). Ritesh, do you agree? > > > > For example... > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 2a4aae6acdcb..adce3339d697 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > > int ret; > > struct ext4_map_blocks map; > > u8 blkbits = inode->i_blkbits; > > + ext4_lblk_t lblk = offset >> blkbits; > > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; > > Why play with last_lblk but? > > > > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > > return -EINVAL; > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) > > @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > > /* > > * Calculate the first and last logical blocks respectively. > > */ > > - map.m_lblk = offset >> blkbits; > > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > + > > + map.m_lblk = lblk; > > + map.m_len = last_lblk - lblk + 1; > > + if (map.m_len == 0 ) > > + map.m_len = 1; > > Not sure but with above changes map.m_len will never be > 0. Right? Yes. If it's 0, in ext4_iomap_is_delalloc we will get an "end" that is less then m_lblk, causing another WARN in ext4_es_find_extent_range. > > Ok, so the problem mainly is coming since ext4_map_blocks() > is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. > > So why change last_lblk? I guess because we need to make sure a sane length value. In the loop in iomap_fiemap, start and length are not checked, assuming be checked by caller. If length get overflowed, the start value for the next loop can also be affected, which makes lblk last_lblk and m_len to go crazy. Thanks. > Shouldn't we just change the logic to return -ENOENT in case > if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by > IOMAP APIs to abort the loop properly. > This along with the map.m_len overlflow patch which I had submitted > before. (since the overflow patch is anyway a valid fix which we anyways > need). > > -ritesh > > > > if (flags & IOMAP_WRITE) > > ret = ext4_iomap_alloc(inode, &map, flags); > > @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > > bool delalloc = false; > > struct ext4_map_blocks map; > > u8 blkbits = inode->i_blkbits; > > + ext4_lblk_t lblk = offset >> blkbits; > > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > > return -EINVAL; > > if (ext4_has_inline_data(inode)) { > > @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > > /* > > * Calculate the first and last logical block respectively. > > */ > > - map.m_lblk = offset >> blkbits; > > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > + > > + map.m_lblk = lblk; > > + map.m_len = last_lblk - lblk + 1; > > + if (map.m_len == 0 ) > > + map.m_len = 1; > > /* > > * Fiemap callers may call for offset beyond s_bitmap_maxbytes. > > >
On 4/20/20 8:27 AM, Murphy Zhou wrote: > On Sun, Apr 19, 2020 at 09:49:27PM +0530, Ritesh Harjani wrote: >> Hello Ted, >> >> On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: >> >>> ext4_map_block() is returning EFSCORRUPTED when lblk is >>> EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to >>> EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, >> >> Yes, I did mention about this case in point 2 in below link though. >> But maybe I was only focused on testing syzcaller reproducer, so >> couldn't test this reported case. >> >> https://www.spinics.net/lists/linux-ext4/msg71387.html >> >> >>> On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: >>>> I think we need to take his patch, and make a simialr change to >>>> ext4_iomap_begin(). Ritesh, do you agree? >>> >>> For example... >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index 2a4aae6acdcb..adce3339d697 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >>> int ret; >>> struct ext4_map_blocks map; >>> u8 blkbits = inode->i_blkbits; >>> + ext4_lblk_t lblk = offset >> blkbits; >>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >> >> Why play with last_lblk but? >> >> >> >>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>> return -EINVAL; >>> if (WARN_ON_ONCE(ext4_has_inline_data(inode))) >>> @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >>> /* >>> * Calculate the first and last logical blocks respectively. >>> */ >>> - map.m_lblk = offset >> blkbits; >>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>> + >>> + map.m_lblk = lblk; >>> + map.m_len = last_lblk - lblk + 1; >>> + if (map.m_len == 0 ) >>> + map.m_len = 1; >> >> Not sure but with above changes map.m_len will never be >> 0. Right? > > Yes. If it's 0, in ext4_iomap_is_delalloc we will get an "end" that > is less then m_lblk, causing another WARN in ext4_es_find_extent_range. Sorry lost you. Ok so what I meant above is. With your changes made in above code to truncate last_lblk and lblk, we may never end up in a situation where map.m_len will be 0. So the below check in your code, isn't it redundant? I wanted to double confirm this with you. + if (map.m_len == 0 ) + map.m_len = 1; > >> >> Ok, so the problem mainly is coming since ext4_map_blocks() >> is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. >> >> So why change last_lblk? > > I guess because we need to make sure a sane length value. In the loop > in iomap_fiemap, start and length are not checked, assuming be checked > by caller. If length get overflowed, the start value for the next loop > can also be affected, which makes lblk last_lblk and m_len to go crazy. Sorry I didn't it explain it right maybe. So if we are anyway changing lblk by truncating it and making sure map.m_len is not getting overflowed (as we did in my previous patch), then we need not play with last_lblk anyways. And FWIW, instead of truncating lblk just so that ext4_map_blocks() doesn't WARN, we can as well just return -ENOENT for lblk >= EXT4_MAX_LOGICAL_BLOCK. ENOENT makes more sense to me, but please feel free to correct me here. Thoughts? Meanwhile, I will also play this change (-ENOENT) a bit to at least get few of the known test cases covered. Also I do had this question for ext4. EXT4_MAX_BLOCKS explaination says that's the max *number* of logical blocks in a file. So since it is the number of blocks, it is equivalent of length. Whereas the EXT4_MAX_LOGICAL_BLOCK says the max logical block of a file, which is equivalent of offset. Considering the logical offset starts from 0, so as Ted was saying having both values same doesn't make sense. Ideally maybe EXT4_MAX_LOGICAL_BLOCK should be 0xFFFFFFFFE. But that may also require some careful checking of all bounds of length and offset across the code. So maybe we can revisit this later. /* * Maximum number of logical blocks in a file; ext4_extent's ee_block is * __le32. */ #define EXT_MAX_BLOCKS 0xffffffff /* Max logical block we can support */ #define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFF -ritesh > > Thanks. > >> Shouldn't we just change the logic to return -ENOENT in case >> if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by >> IOMAP APIs to abort the loop properly. >> This along with the map.m_len overlflow patch which I had submitted >> before. (since the overflow patch is anyway a valid fix which we anyways >> need). >> >> -ritesh >> >> >>> if (flags & IOMAP_WRITE) >>> ret = ext4_iomap_alloc(inode, &map, flags); >>> @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, >>> bool delalloc = false; >>> struct ext4_map_blocks map; >>> u8 blkbits = inode->i_blkbits; >>> + ext4_lblk_t lblk = offset >> blkbits; >>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>> return -EINVAL; >>> if (ext4_has_inline_data(inode)) { >>> @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, >>> /* >>> * Calculate the first and last logical block respectively. >>> */ >>> - map.m_lblk = offset >> blkbits; >>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>> + >>> + map.m_lblk = lblk; >>> + map.m_len = last_lblk - lblk + 1; >>> + if (map.m_len == 0 ) >>> + map.m_len = 1; >>> /* >>> * Fiemap callers may call for offset beyond s_bitmap_maxbytes. >>> >> >
On 4/20/20 9:46 AM, Ritesh Harjani wrote: > > > On 4/20/20 8:27 AM, Murphy Zhou wrote: >> On Sun, Apr 19, 2020 at 09:49:27PM +0530, Ritesh Harjani wrote: >>> Hello Ted, >>> >>> On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: >>> >>>> ext4_map_block() is returning EFSCORRUPTED when lblk is >>>> EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to >>>> EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, >>> >>> Yes, I did mention about this case in point 2 in below link though. >>> But maybe I was only focused on testing syzcaller reproducer, so >>> couldn't test this reported case. >>> >>> https://www.spinics.net/lists/linux-ext4/msg71387.html >>> >>> >>>> On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: >>>>> I think we need to take his patch, and make a simialr change to >>>>> ext4_iomap_begin(). Ritesh, do you agree? >>>> >>>> For example... >>>> >>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>> index 2a4aae6acdcb..adce3339d697 100644 >>>> --- a/fs/ext4/inode.c >>>> +++ b/fs/ext4/inode.c >>>> @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode >>>> *inode, loff_t offset, loff_t length, >>>> int ret; >>>> struct ext4_map_blocks map; >>>> u8 blkbits = inode->i_blkbits; >>>> + ext4_lblk_t lblk = offset >> blkbits; >>>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >>> >>> Why play with last_lblk but? >>> >>> >>> >>>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>>> return -EINVAL; >>>> if (WARN_ON_ONCE(ext4_has_inline_data(inode))) >>>> @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode >>>> *inode, loff_t offset, loff_t length, >>>> /* >>>> * Calculate the first and last logical blocks respectively. >>>> */ >>>> - map.m_lblk = offset >> blkbits; >>>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>> + >>>> + map.m_lblk = lblk; >>>> + map.m_len = last_lblk - lblk + 1; >>>> + if (map.m_len == 0 ) >>>> + map.m_len = 1; >>> >>> Not sure but with above changes map.m_len will never be >>> 0. Right? >> >> Yes. If it's 0, in ext4_iomap_is_delalloc we will get an "end" that >> is less then m_lblk, causing another WARN in ext4_es_find_extent_range. > > Sorry lost you. Ok so what I meant above is. > With your changes made in above code to truncate last_lblk > and lblk, we may never end up in a situation where map.m_len will be 0. > So the below check in your code, isn't it redundant? > I wanted to double confirm this with you. > > + if (map.m_len == 0 ) > + map.m_len = 1; > > >> >>> >>> Ok, so the problem mainly is coming since ext4_map_blocks() >>> is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. >>> >>> So why change last_lblk? >> >> I guess because we need to make sure a sane length value. In the loop >> in iomap_fiemap, start and length are not checked, assuming be checked >> by caller. If length get overflowed, the start value for the next loop >> can also be affected, which makes lblk last_lblk and m_len to go crazy. > > Sorry I didn't it explain it right maybe. So if we are anyway changing > lblk by truncating it and making sure map.m_len is not getting > overflowed (as we did in my previous patch), then we need not play with > last_lblk anyways. > > And FWIW, instead of truncating lblk just so that ext4_map_blocks() > doesn't WARN, we can as well just return -ENOENT for > lblk >= EXT4_MAX_LOGICAL_BLOCK. ENOENT makes more sense to me, > but please feel free to correct me here. > > Thoughts? > > Meanwhile, I will also play this change (-ENOENT) a bit to at least get > few of the known test cases covered. > > > Also I do had this question for ext4. > EXT4_MAX_BLOCKS explaination says that's the max *number* of logical > blocks in a file. So since it is the number of blocks, it is equivalent > of length. Whereas the EXT4_MAX_LOGICAL_BLOCK says the max logical block > of a file, which is equivalent of offset. > Considering the logical offset starts from 0, so as Ted was saying > having both values same doesn't make sense. Ideally maybe > EXT4_MAX_LOGICAL_BLOCK should be 0xFFFFFFFFE. > > But that may also require some careful checking of all bounds of length > and offset across the code. So maybe we can revisit this later. > /* > * Maximum number of logical blocks in a file; ext4_extent's ee_block is > * __le32. > */ > #define EXT_MAX_BLOCKS 0xffffffff > > > /* Max logical block we can support */ > #define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFF After doing some more careful review of code to find out why we return -EFSCORRUPTED from ext4_map_blocks(). I think the reason maybe this:- In case if we have a file with an extent at last logical block of file (which ext4 can support i.e. 0xFFFFFFFE) of length 1. In that case if some tries to call for ext4_map_blocks() for lblk of 0xFFFFFFFF for length 1, then it will fall over below logic condition in ext4_map_blocks (of course will happen if we comment out the logic to return -EFSCORRUPTED from ext4_map_blocks). 4109 /* 4110 * requested block isn't allocated yet; 4111 * we couldn't try to create block if create flag is zero 4112 */ 4113 if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) { 4114 ext4_lblk_t hole_start, hole_len; 4115 4116 hole_start = map->m_lblk; 4117 hole_len = ext4_ext_determine_hole(inode, path, &hole_start); 4118 /* 4119 * put just found gap into cache to speed up 4120 * subsequent requests 4121 */ 4122 ext4_ext_put_gap_in_cache(inode, hole_start, hole_len); 4123 4124 /* Update hole_len to reflect hole size after map->m_lblk */ 4125 if (hole_start != map->m_lblk) 4126 hole_len -= map->m_lblk - hole_start; 4127 map->m_pblk = 0; 4128 map->m_len = min_t(unsigned int, map->m_len, hole_len); 4129 4130 goto out2; 4131 } In here we will try and determine the hole_start and hole_len to put the gap in ext_status cache. "Note that the path which is determined in above is the path for the last extent found in the file." So while trying to determine the hole_start and hole_len, we go into ext4_ext_determine_hole() -> ext4_ext_next_allocated_block() Now since there is no next allocated block, so in that case that function returns EXT_MAX_BLOCKS. And therefore we may hit the BUG_ON in below function. 2202 } else if (*lblk >= le32_to_cpu(ex->ee_block) 2203 + ext4_ext_get_actual_len(ex)) { 2204 ext4_lblk_t next; 2205 2206 *lblk = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex); 2207 next = ext4_ext_next_allocated_block(path); 2208 BUG_ON(next == *lblk); ==> We may hit here. 2209 len = next - *lblk; 2210 } else { 2211 BUG(); 2212 } 2213 return len; Now looking at above, I think below code should be the right fix for this issue. But pls help correct if you think otherwise. We need not take the previous m_len overflow fix. Since the length won't overflow with below change in (EXT4_MAX_LOGICAL_BLOCK). For now, I have tested the 2 known reproducers with this patch alone. Those were fine with this change. diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 91eb4381cae5..ad2dbf6e4924 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -722,7 +722,7 @@ enum { #define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF /* Max logical block we can support */ -#define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFF +#define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFE /* * Structure of an inode on the disk diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9c7b1bad0cd6..e7c0ec58ec98 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3426,7 +3426,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, u8 blkbits = inode->i_blkbits; if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) - return -EINVAL; + return -ENOENT; if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; @@ -3526,7 +3526,7 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, u8 blkbits = inode->i_blkbits; if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) - return -EINVAL; + return -ENOENT; if (ext4_has_inline_data(inode)) { ret = ext4_inline_data_iomap(inode, iomap); -ritesh >> >> Thanks. >> >>> Shouldn't we just change the logic to return -ENOENT in case >>> if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by >>> IOMAP APIs to abort the loop properly. >>> This along with the map.m_len overlflow patch which I had submitted >>> before. (since the overflow patch is anyway a valid fix which we anyways >>> need). >>> >>> -ritesh >>> >>> >>>> if (flags & IOMAP_WRITE) >>>> ret = ext4_iomap_alloc(inode, &map, flags); >>>> @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct >>>> inode *inode, loff_t offset, >>>> bool delalloc = false; >>>> struct ext4_map_blocks map; >>>> u8 blkbits = inode->i_blkbits; >>>> + ext4_lblk_t lblk = offset >> blkbits; >>>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >>>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>>> return -EINVAL; >>>> if (ext4_has_inline_data(inode)) { >>>> @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct >>>> inode *inode, loff_t offset, >>>> /* >>>> * Calculate the first and last logical block respectively. >>>> */ >>>> - map.m_lblk = offset >> blkbits; >>>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>> + >>>> + map.m_lblk = lblk; >>>> + map.m_len = last_lblk - lblk + 1; >>>> + if (map.m_len == 0 ) >>>> + map.m_len = 1; >>>> /* >>>> * Fiemap callers may call for offset beyond s_bitmap_maxbytes. >>>> >>> >> >
On Mon, Apr 20, 2020 at 09:46:01AM +0530, Ritesh Harjani wrote: > > > On 4/20/20 8:27 AM, Murphy Zhou wrote: > > On Sun, Apr 19, 2020 at 09:49:27PM +0530, Ritesh Harjani wrote: > > > Hello Ted, > > > > > > On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: > > > > > > > ext4_map_block() is returning EFSCORRUPTED when lblk is > > > > EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to > > > > EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, > > > > > > Yes, I did mention about this case in point 2 in below link though. > > > But maybe I was only focused on testing syzcaller reproducer, so > > > couldn't test this reported case. > > > > > > https://www.spinics.net/lists/linux-ext4/msg71387.html > > > > > > > > > > On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: > > > > > I think we need to take his patch, and make a simialr change to > > > > > ext4_iomap_begin(). Ritesh, do you agree? > > > > > > > > For example... > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > index 2a4aae6acdcb..adce3339d697 100644 > > > > --- a/fs/ext4/inode.c > > > > +++ b/fs/ext4/inode.c > > > > @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > > > > int ret; > > > > struct ext4_map_blocks map; > > > > u8 blkbits = inode->i_blkbits; > > > > + ext4_lblk_t lblk = offset >> blkbits; > > > > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; > > > > > > Why play with last_lblk but? > > > > > > > > > > > > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > > > > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > > > > return -EINVAL; > > > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) > > > > @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, > > > > /* > > > > * Calculate the first and last logical blocks respectively. > > > > */ > > > > - map.m_lblk = offset >> blkbits; > > > > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > > > > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > > > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > > > > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > > > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > > > > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > > > + > > > > + map.m_lblk = lblk; > > > > + map.m_len = last_lblk - lblk + 1; > > > > + if (map.m_len == 0 ) > > > > + map.m_len = 1; > > > > > > Not sure but with above changes map.m_len will never be > > > 0. Right? > > > > Yes. If it's 0, in ext4_iomap_is_delalloc we will get an "end" that > > is less then m_lblk, causing another WARN in ext4_es_find_extent_range. > > Sorry lost you. Ok so what I meant above is. > With your changes made in above code to truncate last_lblk > and lblk, we may never end up in a situation where map.m_len will be 0. > So the below check in your code, isn't it redundant? > I wanted to double confirm this with you. > > + if (map.m_len == 0 ) > + map.m_len = 1; No it's not redundant. I hit and said that wo/ these two lines we will hit a WARN later. At first I thought truncating values is enough, but it's not. generic/013 (fsstress) can hit the WARN in fs/ext4/extents_status.c:266 easily. By printk values confirmed that at that time m_len is zero. Found some debug notes showing how crazy these numbers are: offset 80000395000 length 3533d50a37ee6ddb, lblk 80000395 llblk d0a3827b lblk 80000395 llblk d0a3827b, m_lblk 80000395 m_len 50a37ee7 end d0a3827b, m_lblk 80000395 m_len 50a37ee7 offset d0a3827c000 length 3533cffffffffddb, lblk d0a3827c llblk d0a3827b lblk d0a3827c llblk d0a3827b, m_lblk d0a3827c m_len 0 end d0a3827b, m_lblk d0a3827c m_len 0 ------------[ cut here ]------------ WARNING: CPU: 6 PID: 7962 at fs/ext4/extents_status.c:266 __es_find_extent_range+0x102/0x120 [ext4] Thanks. > > > > > > > > > > Ok, so the problem mainly is coming since ext4_map_blocks() > > > is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. > > > > > > So why change last_lblk? > > > > I guess because we need to make sure a sane length value. In the loop > > in iomap_fiemap, start and length are not checked, assuming be checked > > by caller. If length get overflowed, the start value for the next loop > > can also be affected, which makes lblk last_lblk and m_len to go crazy. > > Sorry I didn't it explain it right maybe. So if we are anyway changing > lblk by truncating it and making sure map.m_len is not getting > overflowed (as we did in my previous patch), then we need not play with > last_lblk anyways. > > And FWIW, instead of truncating lblk just so that ext4_map_blocks() > doesn't WARN, we can as well just return -ENOENT for > lblk >= EXT4_MAX_LOGICAL_BLOCK. ENOENT makes more sense to me, > but please feel free to correct me here. > > Thoughts? > > Meanwhile, I will also play this change (-ENOENT) a bit to at least get > few of the known test cases covered. > > > Also I do had this question for ext4. > EXT4_MAX_BLOCKS explaination says that's the max *number* of logical > blocks in a file. So since it is the number of blocks, it is equivalent > of length. Whereas the EXT4_MAX_LOGICAL_BLOCK says the max logical block > of a file, which is equivalent of offset. > Considering the logical offset starts from 0, so as Ted was saying > having both values same doesn't make sense. Ideally maybe > EXT4_MAX_LOGICAL_BLOCK should be 0xFFFFFFFFE. > > But that may also require some careful checking of all bounds of length > and offset across the code. So maybe we can revisit this later. > /* > * Maximum number of logical blocks in a file; ext4_extent's ee_block is > * __le32. > */ > #define EXT_MAX_BLOCKS 0xffffffff > > > /* Max logical block we can support */ > #define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFF > > > -ritesh > > > > > Thanks. > > > > > Shouldn't we just change the logic to return -ENOENT in case > > > if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by > > > IOMAP APIs to abort the loop properly. > > > This along with the map.m_len overlflow patch which I had submitted > > > before. (since the overflow patch is anyway a valid fix which we anyways > > > need). > > > > > > -ritesh > > > > > > > > > > if (flags & IOMAP_WRITE) > > > > ret = ext4_iomap_alloc(inode, &map, flags); > > > > @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > > > > bool delalloc = false; > > > > struct ext4_map_blocks map; > > > > u8 blkbits = inode->i_blkbits; > > > > + ext4_lblk_t lblk = offset >> blkbits; > > > > + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; > > > > - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) > > > > + if (lblk > EXT4_MAX_LOGICAL_BLOCK) > > > > return -EINVAL; > > > > if (ext4_has_inline_data(inode)) { > > > > @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, > > > > /* > > > > * Calculate the first and last logical block respectively. > > > > */ > > > > - map.m_lblk = offset >> blkbits; > > > > - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, > > > > - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; > > > > + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) > > > > + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > > > + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) > > > > + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; > > > > + > > > > + map.m_lblk = lblk; > > > > + map.m_len = last_lblk - lblk + 1; > > > > + if (map.m_len == 0 ) > > > > + map.m_len = 1; > > > > /* > > > > * Fiemap callers may call for offset beyond s_bitmap_maxbytes. > > > > > > > > > >
Hello Murphy, On 4/20/20 12:57 PM, Murphy Zhou wrote: > On Mon, Apr 20, 2020 at 09:46:01AM +0530, Ritesh Harjani wrote: >> >> >> On 4/20/20 8:27 AM, Murphy Zhou wrote: >>> On Sun, Apr 19, 2020 at 09:49:27PM +0530, Ritesh Harjani wrote: >>>> Hello Ted, >>>> >>>> On 4/19/20 10:16 AM, Theodore Y. Ts'o wrote: >>>> >>>>> ext4_map_block() is returning EFSCORRUPTED when lblk is >>>>> EXT4_MAX_LOGICAL_BLOCK, which is why he's constraining lblk to >>>>> EXT4_MAX_LOGICAL_BLOCK. I haven't looked into this more closely yet, >>>> >>>> Yes, I did mention about this case in point 2 in below link though. >>>> But maybe I was only focused on testing syzcaller reproducer, so >>>> couldn't test this reported case. >>>> >>>> https://www.spinics.net/lists/linux-ext4/msg71387.html >>>> >>>> >>>>> On Sun, Apr 19, 2020 at 12:42:24AM -0400, Theodore Y. Ts'o wrote: >>>>>> I think we need to take his patch, and make a simialr change to >>>>>> ext4_iomap_begin(). Ritesh, do you agree? >>>>> >>>>> For example... >>>>> >>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>>> index 2a4aae6acdcb..adce3339d697 100644 >>>>> --- a/fs/ext4/inode.c >>>>> +++ b/fs/ext4/inode.c >>>>> @@ -3424,8 +3424,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >>>>> int ret; >>>>> struct ext4_map_blocks map; >>>>> u8 blkbits = inode->i_blkbits; >>>>> + ext4_lblk_t lblk = offset >> blkbits; >>>>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >>>> >>>> Why play with last_lblk but? >>>> >>>> >>>> >>>>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>>>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>>>> return -EINVAL; >>>>> if (WARN_ON_ONCE(ext4_has_inline_data(inode))) >>>>> @@ -3434,9 +3436,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >>>>> /* >>>>> * Calculate the first and last logical blocks respectively. >>>>> */ >>>>> - map.m_lblk = offset >> blkbits; >>>>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>>>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>>>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>>> + >>>>> + map.m_lblk = lblk; >>>>> + map.m_len = last_lblk - lblk + 1; >>>>> + if (map.m_len == 0 ) >>>>> + map.m_len = 1; >>>> >>>> Not sure but with above changes map.m_len will never be >>>> 0. Right? >>> >>> Yes. If it's 0, in ext4_iomap_is_delalloc we will get an "end" that >>> is less then m_lblk, causing another WARN in ext4_es_find_extent_range. >> >> Sorry lost you. Ok so what I meant above is. >> With your changes made in above code to truncate last_lblk >> and lblk, we may never end up in a situation where map.m_len will be 0. >> So the below check in your code, isn't it redundant? >> I wanted to double confirm this with you. >> >> + if (map.m_len == 0 ) >> + map.m_len = 1; > > No it's not redundant. I hit and said that wo/ these two lines we will > hit a WARN later. Ok, so thanks for the logs. I should have figured this out earlier but duh, missed it again. Your values are too big and your variable 'last_lblk' is of type 'ext4_lblk_t' (which is u32). So what you maybe seeing is an overflow case where sometimes your last_lblk is becoming just 1 less than map.m_lblk and thus your are getting map.m_len to be 0. Can you pls carefully review and confirm now this at your end? _(My reasoning behind was this)_ Because for m_len to become 0, we should have lblk = last_lblk + 1. This can only happen if length passed is 0. Or last_lblk got overflowed and become less then lblk. Now AFAIU, length passed as argument cannot be 0. -ritesh > > At first I thought truncating values is enough, but it's not. > generic/013 (fsstress) can hit the WARN in fs/ext4/extents_status.c:266 > easily. > > By printk values confirmed that at that time m_len is zero. > > Found some debug notes showing how crazy these numbers are: > > offset 80000395000 length 3533d50a37ee6ddb, lblk 80000395 llblk d0a3827b > lblk 80000395 llblk d0a3827b, m_lblk 80000395 m_len 50a37ee7 > end d0a3827b, m_lblk 80000395 m_len 50a37ee7 > offset d0a3827c000 length 3533cffffffffddb, lblk d0a3827c llblk d0a3827b > lblk d0a3827c llblk d0a3827b, m_lblk d0a3827c m_len 0 > end d0a3827b, m_lblk d0a3827c m_len 0 > ------------[ cut here ]------------ > WARNING: CPU: 6 PID: 7962 at fs/ext4/extents_status.c:266 __es_find_extent_range+0x102/0x120 [ext4] > > Thanks. >> >> >>> >>>> >>>> Ok, so the problem mainly is coming since ext4_map_blocks() >>>> is returning -EFSCORRUPTED in case if lblk >= EXT4_MAX_LOGICAL_BLOCK. >>>> >>>> So why change last_lblk? >>> >>> I guess because we need to make sure a sane length value. In the loop >>> in iomap_fiemap, start and length are not checked, assuming be checked >>> by caller. If length get overflowed, the start value for the next loop >>> can also be affected, which makes lblk last_lblk and m_len to go crazy. >> >> Sorry I didn't it explain it right maybe. So if we are anyway changing >> lblk by truncating it and making sure map.m_len is not getting >> overflowed (as we did in my previous patch), then we need not play with >> last_lblk anyways. >> >> And FWIW, instead of truncating lblk just so that ext4_map_blocks() >> doesn't WARN, we can as well just return -ENOENT for >> lblk >= EXT4_MAX_LOGICAL_BLOCK. ENOENT makes more sense to me, >> but please feel free to correct me here. >> >> Thoughts? >> >> Meanwhile, I will also play this change (-ENOENT) a bit to at least get >> few of the known test cases covered. >> >> >> Also I do had this question for ext4. >> EXT4_MAX_BLOCKS explaination says that's the max *number* of logical >> blocks in a file. So since it is the number of blocks, it is equivalent >> of length. Whereas the EXT4_MAX_LOGICAL_BLOCK says the max logical block >> of a file, which is equivalent of offset. >> Considering the logical offset starts from 0, so as Ted was saying >> having both values same doesn't make sense. Ideally maybe >> EXT4_MAX_LOGICAL_BLOCK should be 0xFFFFFFFFE. >> >> But that may also require some careful checking of all bounds of length >> and offset across the code. So maybe we can revisit this later. >> /* >> * Maximum number of logical blocks in a file; ext4_extent's ee_block is >> * __le32. >> */ >> #define EXT_MAX_BLOCKS 0xffffffff >> >> >> /* Max logical block we can support */ >> #define EXT4_MAX_LOGICAL_BLOCK 0xFFFFFFFF >> >> >> -ritesh >> >>> >>> Thanks. >>> >>>> Shouldn't we just change the logic to return -ENOENT in case >>>> if (lblk >= EXT4_MAX_LOGICAL_BLOCK)? ENOENT can be handled by >>>> IOMAP APIs to abort the loop properly. >>>> This along with the map.m_len overlflow patch which I had submitted >>>> before. (since the overflow patch is anyway a valid fix which we anyways >>>> need). >>>> >>>> -ritesh >>>> >>>> >>>>> if (flags & IOMAP_WRITE) >>>>> ret = ext4_iomap_alloc(inode, &map, flags); >>>>> @@ -3524,8 +3532,10 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, >>>>> bool delalloc = false; >>>>> struct ext4_map_blocks map; >>>>> u8 blkbits = inode->i_blkbits; >>>>> + ext4_lblk_t lblk = offset >> blkbits; >>>>> + ext4_lblk_t last_lblk = (offset + length - 1) >> blkbits; >>>>> - if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) >>>>> + if (lblk > EXT4_MAX_LOGICAL_BLOCK) >>>>> return -EINVAL; >>>>> if (ext4_has_inline_data(inode)) { >>>>> @@ -3540,9 +3550,15 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, >>>>> /* >>>>> * Calculate the first and last logical block respectively. >>>>> */ >>>>> - map.m_lblk = offset >> blkbits; >>>>> - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, >>>>> - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; >>>>> + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>>> + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>>> + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) >>>>> + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; >>>>> + >>>>> + map.m_lblk = lblk; >>>>> + map.m_len = last_lblk - lblk + 1; >>>>> + if (map.m_len == 0 ) >>>>> + map.m_len = 1; >>>>> /* >>>>> * Fiemap callers may call for offset beyond s_bitmap_maxbytes. >>>>> >>>> >>> >> >
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e416096..3620417 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3523,6 +3523,8 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, int ret; bool delalloc = false; struct ext4_map_blocks map; + ext4_lblk_t last_lblk; + ext4_lblk_t lblk; u8 blkbits = inode->i_blkbits; if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) @@ -3540,9 +3542,18 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, /* * Calculate the first and last logical block respectively. */ - map.m_lblk = offset >> blkbits; - map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, - EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + lblk = offset >> blkbits; + last_lblk = (offset + length - 1) >> blkbits; + + if (last_lblk >= EXT4_MAX_LOGICAL_BLOCK) + last_lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + if (lblk >= EXT4_MAX_LOGICAL_BLOCK) + lblk = EXT4_MAX_LOGICAL_BLOCK - 1; + + map.m_lblk = lblk; + map.m_len = last_lblk - lblk + 1; + if (map.m_len == 0 ) + map.m_len = 1; /* * Fiemap callers may call for offset beyond s_bitmap_maxbytes.
Sometimes crazy userspace values can be here causing overflow issue. After moved ext4_fiemap to using the iomap framework in commit d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") we can hit the WARN_ON at fs/iomap/apply.c:51, then get an EIO error running xfstests generic/009 (and some others) on ext4 based overlayfs. The minimal reproducer is: ------------------------------------- fallocate -l 256M test.img mkfs.ext4 -Fq -b 4096 -I 256 test.img mkdir -p test mount -o loop test.img test || exit pushd test rm -rf l u w m mkdir -p l u w m mount -t overlay -o lowerdir=l,upperdir=u,workdir=w overlay m || exit xfs_io -f -c "pwrite 0 4096" -c "fiemap" m/tf umount m rm -rf l u w m popd umount -d test rm -rf test test.img ------------------------------------- Because we run fiemap command wo/ the offset and length parameters, xfs_io set values based on fs blocksize etc which is got from the mounted fs. These values xfs_io passed are way larger on overlayfs than ext4 directly. So we can't reproduce this directly on ext4 or xfs. I tried to call ioctl directly with large length value but failed to reproduce this. I did not try to get what values xfs_io exactly passing in, but I confirmed that overflowed value when it made into _ext4_fiemap. It's a length of 0x7fffffffffffffff which will mess up the calculation of map.m_lblk and map.m_len, make map.m_len to be 0, then hit WARN_ON and get EIO in iomap_apply. Fixing this by ensuring the offset and length values wont exceed EXT4_MAX_LOGICAL_BLOCK. Also make sure that the length would not be zero because of crazy overflowed values. This patch has been tested with LTP/xfstests showing no new issue. Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: d3b6f23f7167 ("ext4: move ext4_fiemap to use iomap framework") --- fs/ext4/inode.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)