diff mbox

[11/28] ext4: correctly calculate number of blocks for fiemap

Message ID 4B8E1410.1010107@rs.jp.nec.com
State Rejected, archived
Headers show

Commit Message

Akira Fujita March 3, 2010, 7:47 a.m. UTC
Hi,

 > From: Leonard Michlmayr<leonard.michlmayr@gmail.com>
 >
 > ext4_fiemap() rounds the length of the requested range down to
 > blocksize, which is is not the true number of blocks that cover the
 > requested region.  This problem is especially impressive if the user
 > requests only the first byte of a file: not a single extent will be
 > reported.

<snip>

 >   		start_blk = start>>  inode->i_sb->s_blocksize_bits;
 > -		len_blks = len>>  inode->i_sb->s_blocksize_bits;
 > +		last_blk = (start + len - 1)>>  inode->i_sb->s_blocksize_bits;
 > +		len_blks = last_blk - start_blk + 1;

In 1KB block size, the overflow occurs at above line.
Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
Therefore ext4_fiemap() can not get correct extent information
with 0 length.  How about adding this change?

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
---
  extents.c |    3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


(2010/03/03 3:18), Theodore Ts'o wrote:
> From: Leonard Michlmayr<leonard.michlmayr@gmail.com>
>
> ext4_fiemap() rounds the length of the requested range down to
> blocksize, which is is not the true number of blocks that cover the
> requested region.  This problem is especially impressive if the user
> requests only the first byte of a file: not a single extent will be
> reported.
>
> We fix this by calculating the last block of the region and then
> subtract to find the number of blocks in the extents.
>
> Signed-off-by: Leonard Michlmayr<leonard.michlmayr@gmail.com>
> Signed-off-by: "Theodore Ts'o"<tytso@mit.edu>
> ---
>   fs/ext4/extents.c |    6 ++++--
>   1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bd80891..bc9860f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   		__u64 start, __u64 len)
>   {
>   	ext4_lblk_t start_blk;
> -	ext4_lblk_t len_blks;
>   	int error = 0;
>
>   	/* fallback to generic here if not in extents fmt */
> @@ -3782,8 +3781,11 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>   	if (fieinfo->fi_flags&  FIEMAP_FLAG_XATTR) {
>   		error = ext4_xattr_fiemap(inode, fieinfo);
>   	} else {
> +		ext4_lblk_t last_blk, len_blks;
> +
>   		start_blk = start>>  inode->i_sb->s_blocksize_bits;
> -		len_blks = len>>  inode->i_sb->s_blocksize_bits;
> +		last_blk = (start + len - 1)>>  inode->i_sb->s_blocksize_bits;
> +		len_blks = last_blk - start_blk + 1;
>
>   		/*
>   		 * Walk the extent tree gathering extent information.

Comments

Leonard Michlmayr March 3, 2010, 8:34 a.m. UTC | #1
Hi

I think that the patch is valid as it is.

<snip>
>  >   		start_blk = start>>  inode->i_sb->s_blocksize_bits;
>  > -		len_blks = len>>  inode->i_sb->s_blocksize_bits;
>  > +		last_blk = (start + len - 1)>>  inode->i_sb->s_blocksize_bits;
>  > +		len_blks = last_blk - start_blk + 1;
> 
> In 1KB block size, the overflow occurs at above line.
> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
> Therefore ext4_fiemap() can not get correct extent information
> with 0 length.  How about adding this change?
> 

I have considered this possibility, but len == 0 is an invalid request
and would be sorted out by fs/ioctl.c:fiemap_check_ranges. If you want
to make sure that everything is correct even if len == 0 slips through
consider Andreas Dilger's solution which does not introduce a new branch
to the code:

(Here end_blk is one more than the last block)

end_blk = (start + len + inode->i_sb->s_blocksize - 1) >> inode->i_sb->s_blocksize_bits;
len_blks = end_blk - start_blk;

I think we don't need to copy the functionality of fiemap_check_ranges.
At least if there a no danger that len == 0 will be allowed in the
specification in future?

> Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
> ---
>   extents.c |    3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> --- linux-2.6.33-rc8-a/fs/ext4/extents.c        2010-03-03 14:53:49.000000000 +0900
> +++ linux-2.6.33-rc8-b/fs/ext4/extents.c        2010-03-03 15:31:57.000000000 +0900
> @@ -3912,7 +3912,8 @@ int ext4_fiemap(struct inode *inode, str
> 
>                  start_blk = start >> inode->i_sb->s_blocksize_bits;
>                  last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> -               len_blks = last_blk - start_blk + 1;
> +               len_blks = (loff_t)last_blk - start_blk + 1 < EXT_MAX_BLOCK ?
> +                               last_blk - start_blk + 1 : EXT_MAX_BLOCK;
> 
>                  /*
>                   * Walk the extent tree gathering extent information.
> 
> 
> (2010/03/03 3:18), Theodore Ts'o wrote:
> > From: Leonard Michlmayr<leonard.michlmayr@gmail.com>
> >
> > ext4_fiemap() rounds the length of the requested range down to
> > blocksize, which is is not the true number of blocks that cover the
> > requested region.  This problem is especially impressive if the user
> > requests only the first byte of a file: not a single extent will be
> > reported.
> >
> > We fix this by calculating the last block of the region and then
> > subtract to find the number of blocks in the extents.
> >
> > Signed-off-by: Leonard Michlmayr<leonard.michlmayr@gmail.com>
> > Signed-off-by: "Theodore Ts'o"<tytso@mit.edu>
> > ---
> >   fs/ext4/extents.c |    6 ++++--
> >   1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index bd80891..bc9860f 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >   		__u64 start, __u64 len)
> >   {
> >   	ext4_lblk_t start_blk;
> > -	ext4_lblk_t len_blks;
> >   	int error = 0;
> >
> >   	/* fallback to generic here if not in extents fmt */
> > @@ -3782,8 +3781,11 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> >   	if (fieinfo->fi_flags&  FIEMAP_FLAG_XATTR) {
> >   		error = ext4_xattr_fiemap(inode, fieinfo);
> >   	} else {
> > +		ext4_lblk_t last_blk, len_blks;
> > +
> >   		start_blk = start>>  inode->i_sb->s_blocksize_bits;
> > -		len_blks = len>>  inode->i_sb->s_blocksize_bits;
> > +		last_blk = (start + len - 1)>>  inode->i_sb->s_blocksize_bits;
> > +		len_blks = last_blk - start_blk + 1;
> >
> >   		/*
> >   		 * Walk the extent tree gathering extent information.
> 




--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 3, 2010, 5:52 p.m. UTC | #2
On Wed, Mar 03, 2010 at 04:47:28PM +0900, Akira Fujita wrote:
> 
> In 1KB block size, the overflow occurs at above line.
> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
> Therefore ext4_fiemap() can not get correct extent information
> with 0 length.  How about adding this change?

What do you think _is_ the correct thing to do when length is 0?  As
Leonard has already pointed out, fiemap_check_ranges() already filters
out the length=0 case.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Akira Fujita March 4, 2010, 5:40 a.m. UTC | #3
(2010/03/04 2:52), tytso@mit.edu wrote:
> On Wed, Mar 03, 2010 at 04:47:28PM +0900, Akira Fujita wrote:
>>
>> In 1KB block size, the overflow occurs at above line.
>> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
>> Therefore ext4_fiemap() can not get correct extent information
>> with 0 length.  How about adding this change?
> 
> What do you think _is_ the correct thing to do when length is 0?  As
> Leonard has already pointed out, fiemap_check_ranges() already filters
> out the length=0 case.

/mnt/mp1/file1
 ext logical physical expected length flags
   0       0    49153            8192
   1    8192    77825    57344   2048 eof

For example, we do filefrag command for a above file (file1).
FS_IOC_FIEMAP tries to get whole extents information of file1,
so the output has to be 2 extents.
In this case, fm_length (requested block length) is passed
from the user-space to the kernel-space, as follows:

<user-space>
filefrag:
	fiemap->fm_start(0) fiemap->fm_length(~0ULL)

<kernel-space>
fs/ioctl.c ioctl_fimap():

filemap_check_ranges():
	len(~0ULL)
	new_len(4398046511103 = s_maxbytes)  <--- Because 'len > s_maxbytes'

fs/ext4/extents.c ext4_fiemap():
	last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
	         = 4294967295 (0xFFFFFFFF)
	len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
		 = 4294967296 (0x100000000)  <--- _OVERFLOW!!_
	
ext4_ext_walk_space():
	num = 0

This overflow leads to incorrect output like the below,
even though 2 extents exist.

[root@bsd086 akira]# filefrag -v  /mnt/mp1/file1
Filesystem type is: ef53
File size of /mnt/mp1/file1 is 10485760 (10240 blocks, blocksize 1024)
 ext logical physical expected length flags
/mnt/mp1/file6: 1 extent found

Regards,
Akira Fujita
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leonard Michlmayr March 4, 2010, 9:44 p.m. UTC | #4
Akira Fujita:
> <kernel-space>
> fs/ioctl.c ioctl_fimap():
> 
> filemap_check_ranges():
> 	len(~0ULL)
> 	new_len(4398046511103 = s_maxbytes)  <--- Because 'len > s_maxbytes'
> 
> fs/ext4/extents.c ext4_fiemap():
> 	last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
> 	         = 4294967295 (0xFFFFFFFF)
> 	len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
> 		 = 4294967296 (0x100000000)  <--- _OVERFLOW!!_
> 	
> ext4_ext_walk_space():
> 	num = 0
> 
> This overflow leads to incorrect output like the below,
> even though 2 extents exist.
> 

Thank you for pointing this out. I had not checked s_maxbytes.
Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
the number of blocks in a file cannot be stored in a 32bit integer.

I have a patch that should fix it for fiemap. I have just compiled it
and I will do some testing and double checking tomorrow. I will send a
separate email with the patch.

regards
Leonard


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 4, 2010, 11:38 p.m. UTC | #5
Leonard Michlmayr wrote:
> Akira Fujita:
>> <kernel-space>
>> fs/ioctl.c ioctl_fimap():
>>
>> filemap_check_ranges():
>> 	len(~0ULL)
>> 	new_len(4398046511103 = s_maxbytes)  <--- Because 'len > s_maxbytes'
>>
>> fs/ext4/extents.c ext4_fiemap():
>> 	last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
>> 	         = 4294967295 (0xFFFFFFFF)
>> 	len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
>> 		 = 4294967296 (0x100000000)  <--- _OVERFLOW!!_
>> 	
>> ext4_ext_walk_space():
>> 	num = 0
>>
>> This overflow leads to incorrect output like the below,
>> even though 2 extents exist.
>>
> 
> Thank you for pointing this out. I had not checked s_maxbytes.
> Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
> the number of blocks in a file cannot be stored in a 32bit integer.

For extent-based files it had better be....

struct ext4_extent {
        __le32  ee_block;       /* first logical block extent covers */
        __le16  ee_len;         /* number of blocks covered by extent */

The start block can't be more than 32 bits; this essentially limits
the file size / maximum logical block to 2^32 blocks right?

s_maxbytes comes out to 17592186044415

2^32 4k blocks is 17592186044416 bytes, or max byte offset 17592186044415

What am I missing?  (confusion between max byte count and max
byte offset, perhaps?)

-Eric

> I have a patch that should fix it for fiemap. I have just compiled it
> and I will do some testing and double checking tomorrow. I will send a
> separate email with the patch.
> 
> regards
> Leonard
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leonard Michlmayr March 5, 2010, 4:46 p.m. UTC | #6
2010/3/5 Eric Sandeen <sandeen@redhat.com>
>
> > Thank you for pointing this out. I had not checked s_maxbytes.
> > Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
> > the number of blocks in a file cannot be stored in a 32bit integer.
>
> For extent-based files it had better be....
>
> struct ext4_extent {
>        __le32  ee_block;       /* first logical block extent covers */
>        __le16  ee_len;         /* number of blocks covered by extent */
>
> The start block can't be more than 32 bits; this essentially limits
> the file size / maximum logical block to 2^32 blocks right?
>
> s_maxbytes comes out to 17592186044415
>
> 2^32 4k blocks is 17592186044416 bytes, or max byte offset 17592186044415
>
> What am I missing?  (confusion between max byte count and max
> byte offset, perhaps?)
>

Yes. The block offset has the maximum value 1<<32 - 1.
However the number of blocks may be 1<<32.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- linux-2.6.33-rc8-a/fs/ext4/extents.c        2010-03-03 14:53:49.000000000 +0900
+++ linux-2.6.33-rc8-b/fs/ext4/extents.c        2010-03-03 15:31:57.000000000 +0900
@@ -3912,7 +3912,8 @@  int ext4_fiemap(struct inode *inode, str

                 start_blk = start >> inode->i_sb->s_blocksize_bits;
                 last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
-               len_blks = last_blk - start_blk + 1;
+               len_blks = (loff_t)last_blk - start_blk + 1 < EXT_MAX_BLOCK ?
+                               last_blk - start_blk + 1 : EXT_MAX_BLOCK;

                 /*
                  * Walk the extent tree gathering extent information.