diff mbox

[v2,19/28] mke2fs: add inline_data support in mke2fs

Message ID 1386072715-9869-20-git-send-email-wenqing.lz@taobao.com
State Superseded, archived
Headers show

Commit Message

Zheng Liu Dec. 3, 2013, 12:11 p.m. UTC
From: Zheng Liu <wenqing.lz@taobao.com>

Now inline_data doesn't depend on ext_attr.  Hence we don't need to do
this sanity check.  But if the inode size is too small (128 bytes),
inline_data will be useless because we couldn't save data in ibody
extented attribute.  So we need to report this error.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 misc/mke2fs.8.in |    3 +++
 misc/mke2fs.c    |   18 ++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

Comments

Darrick Wong Dec. 3, 2013, 10:30 p.m. UTC | #1
On Tue, Dec 03, 2013 at 08:11:46PM +0800, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> Now inline_data doesn't depend on ext_attr.  Hence we don't need to do
> this sanity check.  But if the inode size is too small (128 bytes),
> inline_data will be useless because we couldn't save data in ibody
> extented attribute.  So we need to report this error.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  misc/mke2fs.8.in |    3 +++
>  misc/mke2fs.c    |   18 ++++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
> index 7b89296..c5e1abb 100644
> --- a/misc/mke2fs.8.in
> +++ b/misc/mke2fs.8.in
> @@ -587,6 +587,9 @@ option).
>  @JDEV@must be created with the same
>  @JDEV@block size as the filesystems that will be using it.
>  .TP
> +.B inline_data
> +Allow data to be stored in the inode and extented attribute area

"extended"

> +.TP
>  .B large_file
>  Filesystem can contain files that are greater than 2GB.  (Modern kernels
>  set this feature automatically when a file > 2GB is created.)
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 67c9225..0a3880f 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -929,7 +929,8 @@ static __u32 ok_features[3] = {
>  		EXT2_FEATURE_INCOMPAT_META_BG|
>  		EXT4_FEATURE_INCOMPAT_FLEX_BG|
>  		EXT4_FEATURE_INCOMPAT_MMP |
> -		EXT4_FEATURE_INCOMPAT_64BIT,
> +		EXT4_FEATURE_INCOMPAT_64BIT|
> +		EXT4_FEATURE_INCOMPAT_INLINE_DATA,
>  	/* R/O compat */
>  	EXT2_FEATURE_RO_COMPAT_LARGE_FILE|
>  		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
> @@ -2069,7 +2070,8 @@ profile_error:
>  				  "See https://ext4.wiki.kernel.org/"
>  				  "index.php/Quota for more information\n\n"));
>  
> -	/* Since sparse_super is the default, we would only have a problem
> +	/*
> +	 * Since sparse_super is the default, we would only have a problem
>  	 * here if it was explicitly disabled.
>  	 */
>  	if ((fs_param.s_feature_compat & EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
> @@ -2125,6 +2127,18 @@ profile_error:
>  				blocksize);
>  			exit(1);
>  		}
> +		/*
> +		 * If inode size is 128 and inline data is enable, we need to

"is enabled"

> +		 * notify users that inline data will never be useful.
> +		 */
> +		if ((fs_param.s_feature_incompat &
> +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {

Perhaps I'm missing something here, but why is it impossible to use i_blocks
for inline data even if there's no space for EAs?

--D

> +			com_err(program_name, 0,
> +				_("inode size is %d, inline data is useless"),
> +				inode_size);
> +			exit(1);
> +		}
>  		fs_param.s_inode_size = inode_size;
>  	}
>  
> -- 
> 1.7.9.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Dec. 4, 2013, 3:27 a.m. UTC | #2
On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
[...]
> > +		 * notify users that inline data will never be useful.
> > +		 */
> > +		if ((fs_param.s_feature_incompat &
> > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> 
> Perhaps I'm missing something here, but why is it impossible to use i_blocks
> for inline data even if there's no space for EAs?

If I understand correctly, on kernel side, we determine an inode has
inline data according to whether we have 'system.data' xattr entry on
inode extended attribute space.  If an inode doesn't have enough space
to store an entry with 'system.data', we just think this inode doesn't
has inline data.  So that is why I add this sanity check.

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick Wong Dec. 4, 2013, 4:08 a.m. UTC | #3
On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> [...]
> > > +		 * notify users that inline data will never be useful.
> > > +		 */
> > > +		if ((fs_param.s_feature_incompat &
> > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > 
> > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > for inline data even if there's no space for EAs?
> 
> If I understand correctly, on kernel side, we determine an inode has
> inline data according to whether we have 'system.data' xattr entry on
> inode extended attribute space.  If an inode doesn't have enough space
> to store an entry with 'system.data', we just think this inode doesn't
> has inline data.  So that is why I add this sanity check.

Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
restriction to me, but oh well, it's your feature.  I don't plan to go back to
128 byte inodes ever. :)

Also, we could store four more bytes if we created a new e_name_index value (5?
9?) to represent "system.data".  Any thoughts about that?

--D

> 
> Regards,
>                                                 - Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Dec. 4, 2013, 5:21 a.m. UTC | #4
On Tue, Dec 03, 2013 at 08:08:19PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> > On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> > [...]
> > > > +		 * notify users that inline data will never be useful.
> > > > +		 */
> > > > +		if ((fs_param.s_feature_incompat &
> > > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > > 
> > > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > > for inline data even if there's no space for EAs?
> > 
> > If I understand correctly, on kernel side, we determine an inode has
> > inline data according to whether we have 'system.data' xattr entry on
> > inode extended attribute space.  If an inode doesn't have enough space
> > to store an entry with 'system.data', we just think this inode doesn't
> > has inline data.  So that is why I add this sanity check.
> 
> Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
> restriction to me, but oh well, it's your feature.  I don't plan to go back to
> 128 byte inodes ever. :)
> 
> Also, we could store four more bytes if we created a new e_name_index value (5?
> 9?) to represent "system.data".  Any thoughts about that?

Sorry, I don't get your point.  Do you want to create a new e_name_index?
Any reason lets you want to do this?

                                                - Zheng

> 
> --D
> 
> > 
> > Regards,
> >                                                 - Zheng
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick Wong Dec. 4, 2013, 5:26 a.m. UTC | #5
On Wed, Dec 04, 2013 at 01:21:50PM +0800, Zheng Liu wrote:
> On Tue, Dec 03, 2013 at 08:08:19PM -0800, Darrick J. Wong wrote:
> > On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> > > On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> > > [...]
> > > > > +		 * notify users that inline data will never be useful.
> > > > > +		 */
> > > > > +		if ((fs_param.s_feature_incompat &
> > > > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > > > 
> > > > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > > > for inline data even if there's no space for EAs?
> > > 
> > > If I understand correctly, on kernel side, we determine an inode has
> > > inline data according to whether we have 'system.data' xattr entry on
> > > inode extended attribute space.  If an inode doesn't have enough space
> > > to store an entry with 'system.data', we just think this inode doesn't
> > > has inline data.  So that is why I add this sanity check.
> > 
> > Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
> > restriction to me, but oh well, it's your feature.  I don't plan to go back to
> > 128 byte inodes ever. :)
> > 
> > Also, we could store four more bytes if we created a new e_name_index value (5?
> > 9?) to represent "system.data".  Any thoughts about that?
> 
> Sorry, I don't get your point.  Do you want to create a new e_name_index?
> Any reason lets you want to do this?

Yep, that's exactly what I propose to do, so we can cram four more bytes into
the inline data.

--D
> 
>                                                 - Zheng
> 
> > 
> > --D
> > 
> > > 
> > > Regards,
> > >                                                 - Zheng
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Dec. 4, 2013, 5:50 a.m. UTC | #6
[Cc Tao to get some comments]

On Tue, Dec 03, 2013 at 09:26:08PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 04, 2013 at 01:21:50PM +0800, Zheng Liu wrote:
> > On Tue, Dec 03, 2013 at 08:08:19PM -0800, Darrick J. Wong wrote:
> > > On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> > > > On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> > > > [...]
> > > > > > +		 * notify users that inline data will never be useful.
> > > > > > +		 */
> > > > > > +		if ((fs_param.s_feature_incompat &
> > > > > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > > > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > > > > 
> > > > > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > > > > for inline data even if there's no space for EAs?
> > > > 
> > > > If I understand correctly, on kernel side, we determine an inode has
> > > > inline data according to whether we have 'system.data' xattr entry on
> > > > inode extended attribute space.  If an inode doesn't have enough space
> > > > to store an entry with 'system.data', we just think this inode doesn't
> > > > has inline data.  So that is why I add this sanity check.
> > > 
> > > Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
> > > restriction to me, but oh well, it's your feature.  I don't plan to go back to
> > > 128 byte inodes ever. :)
> > > 
> > > Also, we could store four more bytes if we created a new e_name_index value (5?
> > > 9?) to represent "system.data".  Any thoughts about that?
> > 
> > Sorry, I don't get your point.  Do you want to create a new e_name_index?
> > Any reason lets you want to do this?
> 
> Yep, that's exactly what I propose to do, so we can cram four more bytes into
> the inline data.

Agree.  I believe it is fine.  But I am wondering if it will break the
file system that inline data has been enabled.

                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick Wong Dec. 4, 2013, 6:02 a.m. UTC | #7
On Wed, Dec 04, 2013 at 01:50:52PM +0800, Zheng Liu wrote:
> [Cc Tao to get some comments]
> 
> On Tue, Dec 03, 2013 at 09:26:08PM -0800, Darrick J. Wong wrote:
> > On Wed, Dec 04, 2013 at 01:21:50PM +0800, Zheng Liu wrote:
> > > On Tue, Dec 03, 2013 at 08:08:19PM -0800, Darrick J. Wong wrote:
> > > > On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> > > > > On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> > > > > [...]
> > > > > > > +		 * notify users that inline data will never be useful.
> > > > > > > +		 */
> > > > > > > +		if ((fs_param.s_feature_incompat &
> > > > > > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > > > > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > > > > > 
> > > > > > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > > > > > for inline data even if there's no space for EAs?
> > > > > 
> > > > > If I understand correctly, on kernel side, we determine an inode has
> > > > > inline data according to whether we have 'system.data' xattr entry on
> > > > > inode extended attribute space.  If an inode doesn't have enough space
> > > > > to store an entry with 'system.data', we just think this inode doesn't
> > > > > has inline data.  So that is why I add this sanity check.
> > > > 
> > > > Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
> > > > restriction to me, but oh well, it's your feature.  I don't plan to go back to
> > > > 128 byte inodes ever. :)
> > > > 
> > > > Also, we could store four more bytes if we created a new e_name_index value (5?
> > > > 9?) to represent "system.data".  Any thoughts about that?
> > > 
> > > Sorry, I don't get your point.  Do you want to create a new e_name_index?
> > > Any reason lets you want to do this?
> > 
> > Yep, that's exactly what I propose to do, so we can cram four more bytes into
> > the inline data.
> 
> Agree.  I believe it is fine.  But I am wondering if it will break the
> file system that inline data has been enabled.

There's a fair amount of work needed for fs/ext4/inline.c.  My e2fsprogs thing
should handle it fine, though I think the inlinedata_max_size function
somewhere in your patchset might also be broken.

I suspect a lot depends on how widely deployed inlinedata is inside Taobao, or
anyone else who's actually running it right now.

--D
> 
>                                                 - Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Dec. 4, 2013, 6:18 a.m. UTC | #8
On Tue, Dec 03, 2013 at 10:02:24PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 04, 2013 at 01:50:52PM +0800, Zheng Liu wrote:
> > [Cc Tao to get some comments]
> > 
> > On Tue, Dec 03, 2013 at 09:26:08PM -0800, Darrick J. Wong wrote:
> > > On Wed, Dec 04, 2013 at 01:21:50PM +0800, Zheng Liu wrote:
> > > > On Tue, Dec 03, 2013 at 08:08:19PM -0800, Darrick J. Wong wrote:
> > > > > On Wed, Dec 04, 2013 at 11:27:57AM +0800, Zheng Liu wrote:
> > > > > > On Tue, Dec 03, 2013 at 02:30:26PM -0800, Darrick J. Wong wrote:
> > > > > > [...]
> > > > > > > > +		 * notify users that inline data will never be useful.
> > > > > > > > +		 */
> > > > > > > > +		if ((fs_param.s_feature_incompat &
> > > > > > > > +		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
> > > > > > > > +		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
> > > > > > > 
> > > > > > > Perhaps I'm missing something here, but why is it impossible to use i_blocks
> > > > > > > for inline data even if there's no space for EAs?
> > > > > > 
> > > > > > If I understand correctly, on kernel side, we determine an inode has
> > > > > > inline data according to whether we have 'system.data' xattr entry on
> > > > > > inode extended attribute space.  If an inode doesn't have enough space
> > > > > > to store an entry with 'system.data', we just think this inode doesn't
> > > > > > has inline data.  So that is why I add this sanity check.
> > > > > 
> > > > > Ok.  I was curious.  Small inode => no inline data seems like an unfortunate
> > > > > restriction to me, but oh well, it's your feature.  I don't plan to go back to
> > > > > 128 byte inodes ever. :)
> > > > > 
> > > > > Also, we could store four more bytes if we created a new e_name_index value (5?
> > > > > 9?) to represent "system.data".  Any thoughts about that?
> > > > 
> > > > Sorry, I don't get your point.  Do you want to create a new e_name_index?
> > > > Any reason lets you want to do this?
> > > 
> > > Yep, that's exactly what I propose to do, so we can cram four more bytes into
> > > the inline data.
> > 
> > Agree.  I believe it is fine.  But I am wondering if it will break the
> > file system that inline data has been enabled.
> 
> There's a fair amount of work needed for fs/ext4/inline.c.  My e2fsprogs thing
> should handle it fine, though I think the inlinedata_max_size function
> somewhere in your patchset might also be broken.
> 
> I suspect a lot depends on how widely deployed inlinedata is inside Taobao, or
> anyone else who's actually running it right now.

That is my concern because at Taobao we have already used inline data on
our own servers.  So obviously it will break our file system. :(

                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index 7b89296..c5e1abb 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -587,6 +587,9 @@  option).
 @JDEV@must be created with the same
 @JDEV@block size as the filesystems that will be using it.
 .TP
+.B inline_data
+Allow data to be stored in the inode and extented attribute area
+.TP
 .B large_file
 Filesystem can contain files that are greater than 2GB.  (Modern kernels
 set this feature automatically when a file > 2GB is created.)
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 67c9225..0a3880f 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -929,7 +929,8 @@  static __u32 ok_features[3] = {
 		EXT2_FEATURE_INCOMPAT_META_BG|
 		EXT4_FEATURE_INCOMPAT_FLEX_BG|
 		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_64BIT,
+		EXT4_FEATURE_INCOMPAT_64BIT|
+		EXT4_FEATURE_INCOMPAT_INLINE_DATA,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE|
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -2069,7 +2070,8 @@  profile_error:
 				  "See https://ext4.wiki.kernel.org/"
 				  "index.php/Quota for more information\n\n"));
 
-	/* Since sparse_super is the default, we would only have a problem
+	/*
+	 * Since sparse_super is the default, we would only have a problem
 	 * here if it was explicitly disabled.
 	 */
 	if ((fs_param.s_feature_compat & EXT2_FEATURE_COMPAT_RESIZE_INODE) &&
@@ -2125,6 +2127,18 @@  profile_error:
 				blocksize);
 			exit(1);
 		}
+		/*
+		 * If inode size is 128 and inline data is enable, we need to
+		 * notify users that inline data will never be useful.
+		 */
+		if ((fs_param.s_feature_incompat &
+		     EXT4_FEATURE_INCOMPAT_INLINE_DATA) &&
+		    inode_size == EXT2_GOOD_OLD_INODE_SIZE) {
+			com_err(program_name, 0,
+				_("inode size is %d, inline data is useless"),
+				inode_size);
+			exit(1);
+		}
 		fs_param.s_inode_size = inode_size;
 	}