ext4: remove metadata reservation checks

Message ID	1403490981-5457-1-git-send-email-tytso@mit.edu
State	New, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> From: Theodore Ts'o <tytso@mit.edu> To: Ext4 Developers List <linux-ext4@vger.kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Subject: [PATCH] ext4: remove metadata reservation checks Date: Sun, 22 Jun 2014 22:36:21 -0400 Message-Id: <1403490981-5457-1-git-send-email-tytso@mit.edu> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk

Message ID

1403490981-5457-1-git-send-email-tytso@mit.edu

State

New, archived

Headers

From: Theodore Ts'o <tytso@mit.edu>
To: Ext4 Developers List <linux-ext4@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Subject: [PATCH] ext4: remove metadata reservation checks
Date: Sun, 22 Jun 2014 22:36:21 -0400
Message-Id: <1403490981-5457-1-git-send-email-tytso@mit.edu>
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Commit Message

Theodore Ts'o June 23, 2014, 2:36 a.m. UTC

Commit 27dd43854227b ("ext4: introduce reserved space") reserves 2% of
the file system space to make sure metadata allocations will always
succeed.  Given that, tracking the reservation of metadata blocks is
no longer necessary.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/ext4/balloc.c  |  1 -
 fs/ext4/ext4.h    |  1 -
 fs/ext4/extents.c |  3 +-
 fs/ext4/inode.c   | 93 +++----------------------------------------------------
 fs/ext4/mballoc.c | 15 +--------
 5 files changed, 7 insertions(+), 106 deletions(-)

Comments

Lukas Czerner June 23, 2014, 9:56 a.m. UTC | #1

On Sun, 22 Jun 2014, Theodore Ts'o wrote:

> Date: Sun, 22 Jun 2014 22:36:21 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH] ext4: remove metadata reservation checks
> 
> Commit 27dd43854227b ("ext4: introduce reserved space") reserves 2% of
> the file system space to make sure metadata allocations will always
> succeed.  Given that, tracking the reservation of metadata blocks is
> no longer necessary.

Hi Ted,

I was a bit reluctant to do this when I introduced reserved space
since I though that we need to run with it for some time to make
sure that it's solid.

However I am still on the fence about this patch, because it was not
designed, at least initially to cover all metadata reservation, but
mainly those we sometimes can not predict (unwritten extent
conversion for example), or those when the prediction failed (in
this case we would see a warning).

I think that if we can be really sure, that the reserved space will
always have enough space to cover all possible metadata blocks
needed on writeback time (is there any other time we might need
metadata block and can not fail with ENOSPC?), then this patch is
definitely very useful.

However I am not sure how to prove that. I think that the size of the
journal might be an indication of how much "reserved" metadata block
we might need at any given time, but I am not entirely sure about
that. But if true, we probably need to modify the metadata
reservation code to take that into account.

So my question is, do you think that there is a way to prove that we
will have enough reserved blocks at any time for metadata allocation ?
If so, what is the metric we use to set the size of the reservation pool
in the first place ?

Thanks!
-Lukas

> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  fs/ext4/balloc.c  |  1 -
>  fs/ext4/ext4.h    |  1 -
>  fs/ext4/extents.c |  3 +-
>  fs/ext4/inode.c   | 93 +++----------------------------------------------------
>  fs/ext4/mballoc.c | 15 +--------
>  5 files changed, 7 insertions(+), 106 deletions(-)
> 
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 0762d14..807071d 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -623,7 +623,6 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
>  	if (!(*errp) &&
>  	    ext4_test_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED)) {
>  		spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> -		EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
>  		spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>  		dquot_alloc_block_nofail(inode,
>  				EXT4_C2B(EXT4_SB(inode->i_sb), ar.len));
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 7cc5a0e..d35c78c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -591,7 +591,6 @@ enum {
>  #define EXT4_FREE_BLOCKS_NO_QUOT_UPDATE	0x0008
>  #define EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER	0x0010
>  #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER	0x0020
> -#define EXT4_FREE_BLOCKS_RESERVE		0x0040
>  
>  /*
>   * ioctl commands
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 4da228a..b30172d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1808,8 +1808,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle,
>  
>  	brelse(path[1].p_bh);
>  	ext4_free_blocks(handle, inode, NULL, blk, 1,
> -			 EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET |
> -			 EXT4_FREE_BLOCKS_RESERVE);
> +			 EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
>  }
>  
>  /*
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8a06473..4ccb624 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -357,35 +357,10 @@ void ext4_da_update_reserve_space(struct inode *inode,
>  		used = ei->i_reserved_data_blocks;
>  	}
>  
> -	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
> -		ext4_warning(inode->i_sb, "ino %lu, allocated %d "
> -			"with only %d reserved metadata blocks "
> -			"(releasing %d blocks with reserved %d data blocks)",
> -			inode->i_ino, ei->i_allocated_meta_blocks,
> -			     ei->i_reserved_meta_blocks, used,
> -			     ei->i_reserved_data_blocks);
> -		WARN_ON(1);
> -		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
> -	}
> -
>  	/* Update per-inode reservations */
>  	ei->i_reserved_data_blocks -= used;
> -	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
> -	percpu_counter_sub(&sbi->s_dirtyclusters_counter,
> -			   used + ei->i_allocated_meta_blocks);
> -	ei->i_allocated_meta_blocks = 0;
> +	percpu_counter_sub(&sbi->s_dirtyclusters_counter, used);
>  
> -	if (ei->i_reserved_data_blocks == 0) {
> -		/*
> -		 * We can release all of the reserved metadata blocks
> -		 * only when we have written all of the delayed
> -		 * allocation blocks.
> -		 */
> -		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
> -				   ei->i_reserved_meta_blocks);
> -		ei->i_reserved_meta_blocks = 0;
> -		ei->i_da_metadata_calc_len = 0;
> -	}
>  	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>  
>  	/* Update quota subsystem for data blocks */
> @@ -1232,35 +1207,6 @@ static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock)
>  	ext4_lblk_t save_last_lblock;
>  	int save_len;
>  
> -	/*
> -	 * recalculate the amount of metadata blocks to reserve
> -	 * in order to allocate nrblocks
> -	 * worse case is one extent per block
> -	 */
> -	spin_lock(&ei->i_block_reservation_lock);
> -	/*
> -	 * ext4_calc_metadata_amount() has side effects, which we have
> -	 * to be prepared undo if we fail to claim space.
> -	 */
> -	save_len = ei->i_da_metadata_calc_len;
> -	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
> -	md_needed = EXT4_NUM_B2C(sbi,
> -				 ext4_calc_metadata_amount(inode, lblock));
> -	trace_ext4_da_reserve_space(inode, md_needed);
> -
> -	/*
> -	 * We do still charge estimated metadata to the sb though;
> -	 * we cannot afford to run out of free blocks.
> -	 */
> -	if (ext4_claim_free_clusters(sbi, md_needed, 0)) {
> -		ei->i_da_metadata_calc_len = save_len;
> -		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
> -		spin_unlock(&ei->i_block_reservation_lock);
> -		return -ENOSPC;
> -	}
> -	ei->i_reserved_meta_blocks += md_needed;
> -	spin_unlock(&ei->i_block_reservation_lock);
> -
>  	return 0;       /* success */
>  }
>  
> @@ -1295,25 +1241,15 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
>  	 * ext4_calc_metadata_amount() has side effects, which we have
>  	 * to be prepared undo if we fail to claim space.
>  	 */
> -	save_len = ei->i_da_metadata_calc_len;
> -	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
> -	md_needed = EXT4_NUM_B2C(sbi,
> -				 ext4_calc_metadata_amount(inode, lblock));
> -	trace_ext4_da_reserve_space(inode, md_needed);
> +	md_needed = 0;
> +	trace_ext4_da_reserve_space(inode, 0);
>  
> -	/*
> -	 * We do still charge estimated metadata to the sb though;
> -	 * we cannot afford to run out of free blocks.
> -	 */
> -	if (ext4_claim_free_clusters(sbi, md_needed + 1, 0)) {
> -		ei->i_da_metadata_calc_len = save_len;
> -		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
> +	if (ext4_claim_free_clusters(sbi, 1, 0)) {
>  		spin_unlock(&ei->i_block_reservation_lock);
>  		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
>  		return -ENOSPC;
>  	}
>  	ei->i_reserved_data_blocks++;
> -	ei->i_reserved_meta_blocks += md_needed;
>  	spin_unlock(&ei->i_block_reservation_lock);
>  
>  	return 0;       /* success */
> @@ -1346,20 +1282,6 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
>  	}
>  	ei->i_reserved_data_blocks -= to_free;
>  
> -	if (ei->i_reserved_data_blocks == 0) {
> -		/*
> -		 * We can release all of the reserved metadata blocks
> -		 * only when we have written all of the delayed
> -		 * allocation blocks.
> -		 * Note that in case of bigalloc, i_reserved_meta_blocks,
> -		 * i_reserved_data_blocks, etc. refer to number of clusters.
> -		 */
> -		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
> -				   ei->i_reserved_meta_blocks);
> -		ei->i_reserved_meta_blocks = 0;
> -		ei->i_da_metadata_calc_len = 0;
> -	}
> -
>  	/* update fs dirty data blocks counter */
>  	percpu_counter_sub(&sbi->s_dirtyclusters_counter, to_free);
>  
> @@ -1500,10 +1422,6 @@ static void ext4_print_free_blocks(struct inode *inode)
>  	ext4_msg(sb, KERN_CRIT, "Block reservation details");
>  	ext4_msg(sb, KERN_CRIT, "i_reserved_data_blocks=%u",
>  		 ei->i_reserved_data_blocks);
> -	ext4_msg(sb, KERN_CRIT, "i_reserved_meta_blocks=%u",
> -	       ei->i_reserved_meta_blocks);
> -	ext4_msg(sb, KERN_CRIT, "i_allocated_meta_blocks=%u",
> -	       ei->i_allocated_meta_blocks);
>  	return;
>  }
>  
> @@ -2843,8 +2761,7 @@ int ext4_alloc_da_blocks(struct inode *inode)
>  {
>  	trace_ext4_alloc_da_blocks(inode);
>  
> -	if (!EXT4_I(inode)->i_reserved_data_blocks &&
> -	    !EXT4_I(inode)->i_reserved_meta_blocks)
> +	if (!EXT4_I(inode)->i_reserved_data_blocks)
>  		return 0;
>  
>  	/*
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 59e3162..4503b8f 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4619,7 +4619,6 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
>  	struct buffer_head *gd_bh;
>  	ext4_group_t block_group;
>  	struct ext4_sb_info *sbi;
> -	struct ext4_inode_info *ei = EXT4_I(inode);
>  	struct ext4_buddy e4b;
>  	unsigned int count_clusters;
>  	int err = 0;
> @@ -4830,19 +4829,7 @@ do_more:
>  			     &sbi->s_flex_groups[flex_group].free_clusters);
>  	}
>  
> -	if (flags & EXT4_FREE_BLOCKS_RESERVE && ei->i_reserved_data_blocks) {
> -		percpu_counter_add(&sbi->s_dirtyclusters_counter,
> -				   count_clusters);
> -		spin_lock(&ei->i_block_reservation_lock);
> -		if (flags & EXT4_FREE_BLOCKS_METADATA)
> -			ei->i_reserved_meta_blocks += count_clusters;
> -		else
> -			ei->i_reserved_data_blocks += count_clusters;
> -		spin_unlock(&ei->i_block_reservation_lock);
> -		if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
> -			dquot_reclaim_block(inode,
> -					EXT4_C2B(sbi, count_clusters));
> -	} else if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
> +	if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
>  		dquot_free_block(inode, EXT4_C2B(sbi, count_clusters));
>  	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o June 23, 2014, 12:59 p.m. UTC | #2

On Mon, Jun 23, 2014 at 11:56:23AM +0200, Lukáš Czerner wrote:
> 
> However I am still on the fence about this patch, because it was not
> designed, at least initially to cover all metadata reservation, but
> mainly those we sometimes can not predict (unwritten extent
> conversion for example), or those when the prediction failed (in
> this case we would see a warning).

I was driven to revisit this because we have a map reduce workload
that triggers the warning fairly consistently, and it's dirtying up
our logs and monitoring systems.  The prediction algorithm we are
using is actually pretty awful, unfortunately, and fixing it to do a
better job is non-trivial.

> I think that if we can be really sure, that the reserved space will
> always have enough space to cover all possible metadata blocks
> needed on writeback time (is there any other time we might need
> metadata block and can not fail with ENOSPC?), then this patch is
> definitely very useful.

The only time we need blocks where we can't fail with ENOSPC is
delayed allocation writeback and the unwritten extent conversion.  And
in both cases, the number of blocks we need are quite small; we can,
after all, fit 340 entries into each extent tree block.

The main worry I might have is the worst case scenario where have a
very small file system.  For example, if you create a file system
which is only 58k, we only have one free block.  But then again, such
a file system only has 4 free blocks, so it's physically impossible to
have any extent tree blocks.  :-)

Here's a potential carefully structured case where 2% of the free
blocks wouldn't be enough.  I'll let folks decide if we think this is
realistic enough that we need to care.  Suppose we have a 4M file
system, using 4k blocks, we would only have 51 reserved blocks.  On
such a file system, there would be 982 free blocks available to be
allocated.  If you then had 200 inodes that had exactly 4 extents
(using 4 blocks written using sparse writes), then there would be 182
free blocks.  If we then posted 100 sparse writes to half of these
inodes, we could end up using 100 data blocks, and also require 100
metadata blocks for the extent tree splits --- and we would then hit
the ENOSPC failure condition.

So this is a "proof" that 2% is quite enough for small file systems.
Do we care?  Eh.  I'm not at all convinced such a worst case scenario
could ever happen in real life, but we could fix this by adding a
"floor" to the 2% calculation so that we reserve at least 128 or 256
blocks.  Or we already have code which disables delayed allocation if
we are close to full, and we could extend that to cover super small
file systems, or simply entirely disable delalloc for super small file
systems in the first case.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner June 27, 2014, 11:29 a.m. UTC | #3

On Mon, 23 Jun 2014, Theodore Ts'o wrote:

> Date: Mon, 23 Jun 2014 08:59:58 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Subject: Re: [PATCH] ext4: remove metadata reservation checks
> 
> On Mon, Jun 23, 2014 at 11:56:23AM +0200, Lukáš Czerner wrote:
> > 
> > However I am still on the fence about this patch, because it was not
> > designed, at least initially to cover all metadata reservation, but
> > mainly those we sometimes can not predict (unwritten extent
> > conversion for example), or those when the prediction failed (in
> > this case we would see a warning).
> 
> I was driven to revisit this because we have a map reduce workload
> that triggers the warning fairly consistently, and it's dirtying up
> our logs and monitoring systems.  The prediction algorithm we are
> using is actually pretty awful, unfortunately, and fixing it to do a
> better job is non-trivial.

Fair enough. I think that removing all the metadata reservation in
favour of the pool of reserved blocks for metadata is the right way
to go. We get rid of some of the complexity and it will become
easier to make sure that we will have enough blocks for metadata
allocation.

> 
> > I think that if we can be really sure, that the reserved space will
> > always have enough space to cover all possible metadata blocks
> > needed on writeback time (is there any other time we might need
> > metadata block and can not fail with ENOSPC?), then this patch is
> > definitely very useful.
> 
> The only time we need blocks where we can't fail with ENOSPC is
> delayed allocation writeback and the unwritten extent conversion.  And
> in both cases, the number of blocks we need are quite small; we can,
> after all, fit 340 entries into each extent tree block.
> 
> The main worry I might have is the worst case scenario where have a
> very small file system.  For example, if you create a file system
> which is only 58k, we only have one free block.  But then again, such
> a file system only has 4 free blocks, so it's physically impossible to
> have any extent tree blocks.  :-)
> 
> Here's a potential carefully structured case where 2% of the free
> blocks wouldn't be enough.  I'll let folks decide if we think this is
> realistic enough that we need to care.  Suppose we have a 4M file
> system, using 4k blocks, we would only have 51 reserved blocks.  On
> such a file system, there would be 982 free blocks available to be
> allocated.  If you then had 200 inodes that had exactly 4 extents
> (using 4 blocks written using sparse writes), then there would be 182
> free blocks.  If we then posted 100 sparse writes to half of these
> inodes, we could end up using 100 data blocks, and also require 100
> metadata blocks for the extent tree splits --- and we would then hit
> the ENOSPC failure condition.

That is a perfect test case :) But seriously I think that we might
want to do better than just hand wave 2% and be done with it. It is
rather arbitrary number I've used, but it was supposed to be a last
resort solution, not the regular allocation pattern.

Also for huge file system this pool might be unnecessarily large, so
we might want to come up with a better heuristic to guess the size
of the pool.

On the other hand, this is something we can add later as well as
export better information about the pool utilization to the user. I
am just trying to be careful because this really is a big change and
there might be some unexpected consequences and we do not have
exactly huge amount of enospc regression tests.

> 
> So this is a "proof" that 2% is quite enough for small file systems.
> Do we care?  Eh.  I'm not at all convinced such a worst case scenario
> could ever happen in real life, but we could fix this by adding a
> "floor" to the 2% calculation so that we reserve at least 128 or 256
> blocks.  Or we already have code which disables delayed allocation if
> we are close to full, and we could extend that to cover super small
> file systems, or simply entirely disable delalloc for super small file
> systems in the first case.

yes this is yet another thing we can use to make it more solid.

Thanks!
-Lukas
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 0762d14..807071d 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -623,7 +623,6 @@  ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
 	if (!(*errp) &&
 	    ext4_test_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED)) {
 		spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
-		EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
 		spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 		dquot_alloc_block_nofail(inode,
 				EXT4_C2B(EXT4_SB(inode->i_sb), ar.len));
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7cc5a0e..d35c78c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -591,7 +591,6 @@  enum {
 #define EXT4_FREE_BLOCKS_NO_QUOT_UPDATE	0x0008
 #define EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER	0x0010
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER	0x0020
-#define EXT4_FREE_BLOCKS_RESERVE		0x0040
 
 /*
  * ioctl commands
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4da228a..b30172d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1808,8 +1808,7 @@  static void ext4_ext_try_to_merge_up(handle_t *handle,
 
 	brelse(path[1].p_bh);
 	ext4_free_blocks(handle, inode, NULL, blk, 1,
-			 EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET |
-			 EXT4_FREE_BLOCKS_RESERVE);
+			 EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
 }
 
 /*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8a06473..4ccb624 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -357,35 +357,10 @@  void ext4_da_update_reserve_space(struct inode *inode,
 		used = ei->i_reserved_data_blocks;
 	}
 
-	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
-		ext4_warning(inode->i_sb, "ino %lu, allocated %d "
-			"with only %d reserved metadata blocks "
-			"(releasing %d blocks with reserved %d data blocks)",
-			inode->i_ino, ei->i_allocated_meta_blocks,
-			     ei->i_reserved_meta_blocks, used,
-			     ei->i_reserved_data_blocks);
-		WARN_ON(1);
-		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
-	}
-
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
-	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
-	percpu_counter_sub(&sbi->s_dirtyclusters_counter,
-			   used + ei->i_allocated_meta_blocks);
-	ei->i_allocated_meta_blocks = 0;
+	percpu_counter_sub(&sbi->s_dirtyclusters_counter, used);
 
-	if (ei->i_reserved_data_blocks == 0) {
-		/*
-		 * We can release all of the reserved metadata blocks
-		 * only when we have written all of the delayed
-		 * allocation blocks.
-		 */
-		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
-				   ei->i_reserved_meta_blocks);
-		ei->i_reserved_meta_blocks = 0;
-		ei->i_da_metadata_calc_len = 0;
-	}
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 
 	/* Update quota subsystem for data blocks */
@@ -1232,35 +1207,6 @@  static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock)
 	ext4_lblk_t save_last_lblock;
 	int save_len;
 
-	/*
-	 * recalculate the amount of metadata blocks to reserve
-	 * in order to allocate nrblocks
-	 * worse case is one extent per block
-	 */
-	spin_lock(&ei->i_block_reservation_lock);
-	/*
-	 * ext4_calc_metadata_amount() has side effects, which we have
-	 * to be prepared undo if we fail to claim space.
-	 */
-	save_len = ei->i_da_metadata_calc_len;
-	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
-	md_needed = EXT4_NUM_B2C(sbi,
-				 ext4_calc_metadata_amount(inode, lblock));
-	trace_ext4_da_reserve_space(inode, md_needed);
-
-	/*
-	 * We do still charge estimated metadata to the sb though;
-	 * we cannot afford to run out of free blocks.
-	 */
-	if (ext4_claim_free_clusters(sbi, md_needed, 0)) {
-		ei->i_da_metadata_calc_len = save_len;
-		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
-		spin_unlock(&ei->i_block_reservation_lock);
-		return -ENOSPC;
-	}
-	ei->i_reserved_meta_blocks += md_needed;
-	spin_unlock(&ei->i_block_reservation_lock);
-
 	return 0;       /* success */
 }
 
@@ -1295,25 +1241,15 @@  static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
 	 * ext4_calc_metadata_amount() has side effects, which we have
 	 * to be prepared undo if we fail to claim space.
 	 */
-	save_len = ei->i_da_metadata_calc_len;
-	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
-	md_needed = EXT4_NUM_B2C(sbi,
-				 ext4_calc_metadata_amount(inode, lblock));
-	trace_ext4_da_reserve_space(inode, md_needed);
+	md_needed = 0;
+	trace_ext4_da_reserve_space(inode, 0);
 
-	/*
-	 * We do still charge estimated metadata to the sb though;
-	 * we cannot afford to run out of free blocks.
-	 */
-	if (ext4_claim_free_clusters(sbi, md_needed + 1, 0)) {
-		ei->i_da_metadata_calc_len = save_len;
-		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
+	if (ext4_claim_free_clusters(sbi, 1, 0)) {
 		spin_unlock(&ei->i_block_reservation_lock);
 		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
 		return -ENOSPC;
 	}
 	ei->i_reserved_data_blocks++;
-	ei->i_reserved_meta_blocks += md_needed;
 	spin_unlock(&ei->i_block_reservation_lock);
 
 	return 0;       /* success */
@@ -1346,20 +1282,6 @@  static void ext4_da_release_space(struct inode *inode, int to_free)
 	}
 	ei->i_reserved_data_blocks -= to_free;
 
-	if (ei->i_reserved_data_blocks == 0) {
-		/*
-		 * We can release all of the reserved metadata blocks
-		 * only when we have written all of the delayed
-		 * allocation blocks.
-		 * Note that in case of bigalloc, i_reserved_meta_blocks,
-		 * i_reserved_data_blocks, etc. refer to number of clusters.
-		 */
-		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
-				   ei->i_reserved_meta_blocks);
-		ei->i_reserved_meta_blocks = 0;
-		ei->i_da_metadata_calc_len = 0;
-	}
-
 	/* update fs dirty data blocks counter */
 	percpu_counter_sub(&sbi->s_dirtyclusters_counter, to_free);
 
@@ -1500,10 +1422,6 @@  static void ext4_print_free_blocks(struct inode *inode)
 	ext4_msg(sb, KERN_CRIT, "Block reservation details");
 	ext4_msg(sb, KERN_CRIT, "i_reserved_data_blocks=%u",
 		 ei->i_reserved_data_blocks);
-	ext4_msg(sb, KERN_CRIT, "i_reserved_meta_blocks=%u",
-	       ei->i_reserved_meta_blocks);
-	ext4_msg(sb, KERN_CRIT, "i_allocated_meta_blocks=%u",
-	       ei->i_allocated_meta_blocks);
 	return;
 }
 
@@ -2843,8 +2761,7 @@  int ext4_alloc_da_blocks(struct inode *inode)
 {
 	trace_ext4_alloc_da_blocks(inode);
 
-	if (!EXT4_I(inode)->i_reserved_data_blocks &&
-	    !EXT4_I(inode)->i_reserved_meta_blocks)
+	if (!EXT4_I(inode)->i_reserved_data_blocks)
 		return 0;
 
 	/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 59e3162..4503b8f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4619,7 +4619,6 @@  void ext4_free_blocks(handle_t *handle, struct inode *inode,
 	struct buffer_head *gd_bh;
 	ext4_group_t block_group;
 	struct ext4_sb_info *sbi;
-	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct ext4_buddy e4b;
 	unsigned int count_clusters;
 	int err = 0;
@@ -4830,19 +4829,7 @@  do_more:
 			     &sbi->s_flex_groups[flex_group].free_clusters);
 	}
 
-	if (flags & EXT4_FREE_BLOCKS_RESERVE && ei->i_reserved_data_blocks) {
-		percpu_counter_add(&sbi->s_dirtyclusters_counter,
-				   count_clusters);
-		spin_lock(&ei->i_block_reservation_lock);
-		if (flags & EXT4_FREE_BLOCKS_METADATA)
-			ei->i_reserved_meta_blocks += count_clusters;
-		else
-			ei->i_reserved_data_blocks += count_clusters;
-		spin_unlock(&ei->i_block_reservation_lock);
-		if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
-			dquot_reclaim_block(inode,
-					EXT4_C2B(sbi, count_clusters));
-	} else if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
+	if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
 		dquot_free_block(inode, EXT4_C2B(sbi, count_clusters));
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);

ext4: remove metadata reservation checks

Commit Message

Comments

Patch