[2/2] ext4: handle errors in ext4_clear_blocks()

Message ID	AANLkTi=zteq96AQ54f3qo29dRGCmMhwtR4BzKPXQLt0W@mail.gmail.com
State	Accepted, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=vUIOgwrlasHsYLNVy2bSQD8XbGaoVZ3iGGpv7PPFwBk52kSieKgF300/+kc0Q8aC45 Kad/YKFVodeBxL7iHYu+4HhTdZe/+nmz9s43mIZROi8PWdjWyuCk5Vo7p649FUmoRRux 4A/SsiyWxI+dgBMgdR6hoUKsENtyngr8vMxBE= MIME-Version: 1.0 Date: Tue, 1 Mar 2011 14:28:26 +0200 Message-ID: <AANLkTi=zteq96AQ54f3qo29dRGCmMhwtR4BzKPXQLt0W@mail.gmail.com> Subject: [PATCH 2/2] ext4: handle errors in ext4_clear_blocks() From: Amir Goldstein <amir73il@gmail.com> To: Theodore Tso <tytso@mit.edu> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk

Message ID

AANLkTi=zteq96AQ54f3qo29dRGCmMhwtR4BzKPXQLt0W@mail.gmail.com

State

Accepted, archived

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:cc:content-type;
	b=vUIOgwrlasHsYLNVy2bSQD8XbGaoVZ3iGGpv7PPFwBk52kSieKgF300/+kc0Q8aC45
	Kad/YKFVodeBxL7iHYu+4HhTdZe/+nmz9s43mIZROi8PWdjWyuCk5Vo7p649FUmoRRux
	4A/SsiyWxI+dgBMgdR6hoUKsENtyngr8vMxBE=
MIME-Version: 1.0
Date: Tue, 1 Mar 2011 14:28:26 +0200
Message-ID: <AANLkTi=zteq96AQ54f3qo29dRGCmMhwtR4BzKPXQLt0W@mail.gmail.com>
Subject: [PATCH 2/2] ext4: handle errors in ext4_clear_blocks()
From: Amir Goldstein <amir73il@gmail.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Content-Type: text/plain; charset=ISO-8859-1
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Commit Message

Amir Goldstein March 1, 2011, 12:28 p.m. UTC

Checking return code from ext4_journal_get_write_access() is important
with snapshots, because this function invokes COW, so may return new
errors, such as ENOSPC.

ext4_clear_blocks() now returns < 0 for fatal errors, in which case,
ext4_free_data() is aborted.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
---
 fs/ext4/inode.c |   46 ++++++++++++++++++++++++++--------------------
 1 files changed, 26 insertions(+), 20 deletions(-)


@@ -4149,6 +4148,9 @@ static int ext4_clear_blocks(handle_t *handle,
struct inode *inode,

 	ext4_free_blocks(handle, inode, NULL, block_to_free, count, flags);
 	return 0;
+out_err:
+	ext4_std_error(inode->i_sb, err);
+	return err;
 }

 /**
@@ -4182,7 +4184,7 @@ static void ext4_free_data(handle_t *handle,
struct inode *inode,
 	ext4_fsblk_t nr;		    /* Current block # */
 	__le32 *p;			    /* Pointer into inode/ind
 					       for current block */
-	int err;
+	int err = 0;

 	if (this_bh) {				/* For indirect block */
 		BUFFER_TRACE(this_bh, "get_write_access");
@@ -4204,9 +4206,10 @@ static void ext4_free_data(handle_t *handle,
struct inode *inode,
 			} else if (nr == block_to_free + count) {
 				count++;
 			} else {
-				if (ext4_clear_blocks(handle, inode, this_bh,
-						      block_to_free, count,
-						      block_to_free_p, p))
+				err = ext4_clear_blocks(handle, inode, this_bh,
+						        block_to_free, count,
+						        block_to_free_p, p);
+				if (err)
 					break;
 				block_to_free = nr;
 				block_to_free_p = p;
@@ -4215,9 +4218,12 @@ static void ext4_free_data(handle_t *handle,
struct inode *inode,
 		}
 	}

-	if (count > 0)
-		ext4_clear_blocks(handle, inode, this_bh, block_to_free,
-				  count, block_to_free_p, p);
+	if (!err && count > 0)
+		err = ext4_clear_blocks(handle, inode, this_bh, block_to_free,
+					count, block_to_free_p, p);
+	if (err < 0)
+		/* fatal error */
+		return;

 	if (this_bh) {
 		BUFFER_TRACE(this_bh, "call ext4_handle_dirty_metadata");

Comments

Theodore Ts'o March 21, 2011, 1:32 a.m. UTC | #1

On Tue, Mar 01, 2011 at 02:28:26PM +0200, Amir Goldstein wrote:
> Checking return code from ext4_journal_get_write_access() is important
> with snapshots, because this function invokes COW, so may return new
> errors, such as ENOSPC.
> 
> ext4_clear_blocks() now returns < 0 for fatal errors, in which case,
> ext4_free_data() is aborted.

I'll apply this patch because it's not wrong, but I don't think it's
enough.  Just aborting ext4_free_data() is going to get you into
beaucoup trouble, since that happens inside ext4_truncate().  Aborting
ext4_free_data without even returning an error code is going to leave
the file system corrupted.  But aborting the truncate code (this is
only in the non-extent-mapped inode case) is going to get _awfully_
messy.

I suspect you may need to estimate how many blocks you need, and check
to make sure you can COW that many blocks, and reserve them so the you
don't have to deal with a failure in the middle of an ext4_truncate()
operation.  This is not going to be pretty....

	    	    	      	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o March 21, 2011, 1:37 a.m. UTC | #2

On Tue, Mar 01, 2011 at 02:28:26PM +0200, Amir Goldstein wrote:
> +out_err:
> +	ext4_std_error(inode->i_sb, err);
> +	return err;

Umm, you do realize this function will mark the file system as dirty,
and possibly remount the file system read-only, or panic the system,
right?

That's appropriate for normal journal failures (since there's no
recovering from them), but if you're proposing that a simple lack of
free blocks when doing a COW operation will result in a ENOSPC, all a
malicious userspace application needs to do to potentially bring the
system down is to attempt to unlink or truncate a file while COW
snapshots are enabled.....

Do you really want me to include this patch?  I think you may want to
rethink how you want to handle this case....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein March 21, 2011, 4:23 a.m. UTC | #3

On Mon, Mar 21, 2011 at 3:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Tue, Mar 01, 2011 at 02:28:26PM +0200, Amir Goldstein wrote:
>> +out_err:
>> +     ext4_std_error(inode->i_sb, err);
>> +     return err;
>
> Umm, you do realize this function will mark the file system as dirty,
> and possibly remount the file system read-only, or panic the system,
> right?

I am counting on that. In fact I think we should not allow snapshots
and errors=continue,
otherwise we just as good (bad) as LVM snapshot that vanishes when it
runs out of space.

>
> That's appropriate for normal journal failures (since there's no
> recovering from them), but if you're proposing that a simple lack of
> free blocks when doing a COW operation will result in a ENOSPC, all a
> malicious userspace application needs to do to potentially bring the
> system down is to attempt to unlink or truncate a file while COW
> snapshots are enabled.....

It's not that easy to cause ENOSPC during COW.
s_snapshot_r_blocks has blocks reserved for COW,
but I do want the fs to freeze if that reservation somehow runs out.
The reservation is based on metadata size estimation, calculated
for no of used inodes no of dirs and no of used blocks. It may fall
short when there are very many hard links or very long file names,
resulting in lots of directory blocks and when all of them are COWed.

short in many hard links

>
> Do you really want me to include this patch?  I think you may want to
> rethink how you want to handle this case....

I do, because when errors=remount-ro, this function emmits lots of
"journal has aborted" errors, which are of no useful to anyone...

>
>                                                - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein March 21, 2011, 5:18 a.m. UTC | #4

On Mon, Mar 21, 2011 at 6:23 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Mar 21, 2011 at 3:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
>> On Tue, Mar 01, 2011 at 02:28:26PM +0200, Amir Goldstein wrote:
>>> +out_err:
>>> +     ext4_std_error(inode->i_sb, err);
>>> +     return err;
>>
>> Umm, you do realize this function will mark the file system as dirty,
>> and possibly remount the file system read-only, or panic the system,
>> right?
>
> I am counting on that. In fact I think we should not allow snapshots
> and errors=continue,
> otherwise we just as good (bad) as LVM snapshot that vanishes when it
> runs out of space.
>
>>
>> That's appropriate for normal journal failures (since there's no
>> recovering from them), but if you're proposing that a simple lack of
>> free blocks when doing a COW operation will result in a ENOSPC, all a
>> malicious userspace application needs to do to potentially bring the
>> system down is to attempt to unlink or truncate a file while COW
>> snapshots are enabled.....

And FYI, unlink/truncate of large file doesn't take up a lot of disk space,
the data blocks are moved to snapshot and only metadata needs to be
COWed, which should be accounted for by the snapshot disk space
reservation.

The worst DoS, which a malicious user space application can do is to
take up it's own disk quota X number of retained snapshots.
This is something that systems administrators have to take into account
when mixing snapshots and quotas.
Just creating new files and deleting them, does not make snapshot disk
usage grow.


>
> It's not that easy to cause ENOSPC during COW.
> s_snapshot_r_blocks has blocks reserved for COW,
> but I do want the fs to freeze if that reservation somehow runs out.
> The reservation is based on metadata size estimation, calculated
> for no of used inodes no of dirs and no of used blocks. It may fall
> short when there are very many hard links or very long file names,
> resulting in lots of directory blocks and when all of them are COWed.
>
> short in many hard links
>
>>
>> Do you really want me to include this patch?  I think you may want to
>> rethink how you want to handle this case....
>
> I do, because when errors=remount-ro, this function emmits lots of
> "journal has aborted" errors, which are of no useful to anyone...
>
>>
>>                                                - Ted
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 67e7a3c..13d3952 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4096,6 +4096,9 @@  no_top:
  *
  * We release `count' blocks on disk, but (last - first) may be greater
  * than `count' because there can be holes in there.
+ *
+ * Return 0 on success, 1 on invalid block range
+ * and < 0 on fatal error.
  */
 static int ext4_clear_blocks(handle_t *handle, struct inode *inode,
 			     struct buffer_head *bh,
@@ -4122,25 +4125,21 @@  static int ext4_clear_blocks(handle_t *handle,
struct inode *inode,
 		if (bh) {
 			BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
 			err = ext4_handle_dirty_metadata(handle, inode, bh);
-			if (unlikely(err)) {
-				ext4_std_error(inode->i_sb, err);
-				return 1;
-			}
+			if (unlikely(err))
+				goto out_err;
 		}
 		err = ext4_mark_inode_dirty(handle, inode);
-		if (unlikely(err)) {
-			ext4_std_error(inode->i_sb, err);
-			return 1;
-		}
+		if (unlikely(err))
+			goto out_err;
 		err = ext4_truncate_restart_trans(handle, inode,
 						  blocks_for_truncate(inode));
-		if (unlikely(err)) {
-			ext4_std_error(inode->i_sb, err);
-			return 1;
-		}
+		if (unlikely(err))
+			goto out_err;
 		if (bh) {
 			BUFFER_TRACE(bh, "retaking write access");
-			ext4_journal_get_write_access(handle, bh);
+			err = ext4_journal_get_write_access(handle, bh);
+			if (unlikely(err))
+				goto out_err;
 		}
 	}

[2/2] ext4: handle errors in ext4_clear_blocks()

Commit Message

Comments

Patch