diff mbox series

[RESEND,v2] ext4: Optimization of no-op ext4_truncate triggers

Message ID 20241016111624.5229-1-linmaxi@gmail.com
State New
Headers show
Series [RESEND,v2] ext4: Optimization of no-op ext4_truncate triggers | expand

Commit Message

Max Brener Oct. 16, 2024, 11:16 a.m. UTC
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306
v1: https://lore.kernel.org/lkml/20240926221103.24423-1-linmaxi@gmail.com/T/

Changes from last version: Moved vfs-level changes to be ext4-level,
and improved the description of the patch.

This patch enables skipping no-op 'ext4_truncate' calls. Analyzing the kernel
with ftrace shows ext4_truncate is being sometimes called without making any
impact, and sometimes userspace programs might call ext4_truncate in vein. By 
detecting these calls and skipping them, cpu time is saved.

I'll fix this by skipping ext4_truncate call in 'ext4_setattr' when the file's size
hasn't changed AND it hasn't been truncated since the last disk space preallocation.
It is meant to consider the case when ext4_truncate is being called to truncate
preallocated blocks too. Notice that so far, the condition to triggering 
ext4_truncate by the user was: if (attr->ia_size <= oldsize) which means it is
being triggered when attr->ia_size == oldsize regardless of whether there are
preallocated blocks or not - if there are none, then the call is redundant.

Steps:
1.Add a new inode state flag: EXT4_STATE_TRUNCATED
2.Clear the flag when ext4_fallocate is being called with FALLOC_FL_KEEP_SIZE flag
to enable using ext4_truncate again, to remove preallocated disk space that may
have resulted from this call.
3.Set EXT4_STATE_TRUNCATED when ext4_truncated is called successfully.
4.Don't skip ext4_truncate in ext4_setattr when the size of the file has either been
reduced OR stayed the same, but hasn't been truncated yet. This is in order to allow
truncating of preallocated blocks.

Signed-off-by: Max Brener <linmaxi@gmail.com>
---
 fs/ext4/ext4.h    | 1 +
 fs/ext4/extents.c | 4 ++++
 fs/ext4/inode.c   | 6 +++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

Comments

Theodore Ts'o Nov. 7, 2024, 5:03 a.m. UTC | #1
On Wed, Oct 16, 2024 at 02:16:24PM +0300, Max Brener wrote:
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306
> v1: https://lore.kernel.org/lkml/20240926221103.24423-1-linmaxi@gmail.com/T/
> 
> Changes from last version: Moved vfs-level changes to be ext4-level,
> and improved the description of the patch.
> 
> This patch enables skipping no-op 'ext4_truncate' calls. Analyzing the kernel
> with ftrace shows ext4_truncate is being sometimes called without making any
> impact, and sometimes userspace programs might call ext4_truncate in vein. By 
> detecting these calls and skipping them, cpu time is saved.
> 
> I'll fix this by skipping ext4_truncate call in 'ext4_setattr' when the file's size
> hasn't changed AND it hasn't been truncated since the last disk space preallocation.
> It is meant to consider the case when ext4_truncate is being called to truncate
> preallocated blocks too. Notice that so far, the condition to triggering 
> ext4_truncate by the user was: if (attr->ia_size <= oldsize) which means it is
> being triggered when attr->ia_size == oldsize regardless of whether there are
> preallocated blocks or not - if there are none, then the call is redundant.
> 
> Steps:
> 1.Add a new inode state flag: EXT4_STATE_TRUNCATED
> 2.Clear the flag when ext4_fallocate is being called with FALLOC_FL_KEEP_SIZE flag
> to enable using ext4_truncate again, to remove preallocated disk space that may
> have resulted from this call.
> 3.Set EXT4_STATE_TRUNCATED when ext4_truncated is called successfully.
> 4.Don't skip ext4_truncate in ext4_setattr when the size of the file has either been
> reduced OR stayed the same, but hasn't been truncated yet. This is in order to allow
> truncating of preallocated blocks.

This patch is still not quite right.  See Jan's comment from [1]:

   Agreed as well. I'll also note that keeping such flag uptodate is not as
   simple as it seems because there are various places that may be allocating
   blocks beyond EOF (for example extending writes) and that rely on
   ext4_truncate() removing them so one needs to be careful to capture all the
   places where the "truncated" state needs to be cleared.

[1] https://lore.kernel.org/all/20240930095601.x66iqw74bxffytgq@quack3/

						- Ted
Max Brener Nov. 12, 2024, 3:44 p.m. UTC | #2
‫בתאריך יום ה׳, 7 בנוב׳ 2024 ב-7:03 מאת ‪Theodore Ts'o‬‏ <‪tytso@mit.edu‬‏>:‬
>
> On Wed, Oct 16, 2024 at 02:16:24PM +0300, Max Brener wrote:
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219306
> > v1: https://lore.kernel.org/lkml/20240926221103.24423-1-linmaxi@gmail.com/T/
> >
> > Changes from last version: Moved vfs-level changes to be ext4-level,
> > and improved the description of the patch.
> >
> > This patch enables skipping no-op 'ext4_truncate' calls. Analyzing the kernel
> > with ftrace shows ext4_truncate is being sometimes called without making any
> > impact, and sometimes userspace programs might call ext4_truncate in vein. By
> > detecting these calls and skipping them, cpu time is saved.
> >
> > I'll fix this by skipping ext4_truncate call in 'ext4_setattr' when the file's size
> > hasn't changed AND it hasn't been truncated since the last disk space preallocation.
> > It is meant to consider the case when ext4_truncate is being called to truncate
> > preallocated blocks too. Notice that so far, the condition to triggering
> > ext4_truncate by the user was: if (attr->ia_size <= oldsize) which means it is
> > being triggered when attr->ia_size == oldsize regardless of whether there are
> > preallocated blocks or not - if there are none, then the call is redundant.
> >
> > Steps:
> > 1.Add a new inode state flag: EXT4_STATE_TRUNCATED
> > 2.Clear the flag when ext4_fallocate is being called with FALLOC_FL_KEEP_SIZE flag
> > to enable using ext4_truncate again, to remove preallocated disk space that may
> > have resulted from this call.
> > 3.Set EXT4_STATE_TRUNCATED when ext4_truncated is called successfully.
> > 4.Don't skip ext4_truncate in ext4_setattr when the size of the file has either been
> > reduced OR stayed the same, but hasn't been truncated yet. This is in order to allow
> > truncating of preallocated blocks.
>
> This patch is still not quite right.  See Jan's comment from [1]:
>
>    Agreed as well. I'll also note that keeping such flag uptodate is not as
>    simple as it seems because there are various places that may be allocating
>    blocks beyond EOF (for example extending writes) and that rely on
>    ext4_truncate() removing them so one needs to be careful to capture all the
>    places where the "truncated" state needs to be cleared.
>
> [1] https://lore.kernel.org/all/20240930095601.x66iqw74bxffytgq@quack3/
>
>                                                 - Ted

Okay I understand now, I initially thought any preallocation is necessarily done
through a VFS interface. Now that I see preallocations are done at mballoc,
what I can offer is to clear the TRUNCATED flag at ext4_mb_new_blocks().
Would that be ok in your opinion?
diff mbox series

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 44b0d418143c..34128a88e88f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1915,6 +1915,7 @@  enum {
 	EXT4_STATE_VERITY_IN_PROGRESS,	/* building fs-verity Merkle tree */
 	EXT4_STATE_FC_COMMITTING,	/* Fast commit ongoing */
 	EXT4_STATE_ORPHAN_FILE,		/* Inode orphaned in orphan file */
+	EXT4_STATE_TRUNCATED,		/* Inode is truncated */
 };
 
 #define EXT4_INODE_BIT_FNS(name, field, offset)				\
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 34e25eee6521..b480c29dfc65 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4782,6 +4782,10 @@  long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		ret = ext4_zero_range(file, offset, len, mode);
 		goto exit;
 	}
+
+	if (mode & FALLOC_FL_KEEP_SIZE)
+		ext4_clear_inode_state(inode, EXT4_STATE_TRUNCATED);
+
 	trace_ext4_fallocate_enter(inode, offset, len, mode);
 	lblk = offset >> blkbits;
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54bdd4884fe6..cbdad3253920 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4193,6 +4193,8 @@  int ext4_truncate(struct inode *inode)
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
 
+	ext4_set_inode_state(inode, EXT4_STATE_TRUNCATED);
+
 out_stop:
 	/*
 	 * If this was a simple ftruncate() and the file will remain alive,
@@ -5492,7 +5494,9 @@  int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		 * Call ext4_truncate() even if i_size didn't change to
 		 * truncate possible preallocated blocks.
 		 */
-		if (attr->ia_size <= oldsize) {
+		if (attr->ia_size < oldsize ||
+			(attr->ia_size == oldsize &&
+			!ext4_test_inode_state(inode, EXT4_STATE_TRUNCATED))) {
 			rc = ext4_truncate(inode);
 			if (rc)
 				error = rc;