[RFC] Journal superblock update should send a barrier

Message ID	20100423002554.GB25327@quack.suse.cz
State	New, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> Date: Fri, 23 Apr 2010 02:25:55 +0200 From: Jan Kara <jack@suse.cz> To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu Subject: [RFC][PATCH] Journal superblock update should send a barrier Message-ID: <20100423002554.GB25327@quack.suse.cz> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk

Message ID

20100423002554.GB25327@quack.suse.cz

State

New, archived

Headers

Date: Fri, 23 Apr 2010 02:25:55 +0200
From: Jan Kara <jack@suse.cz>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu
Subject: [RFC][PATCH] Journal superblock update should send a barrier
Message-ID: <20100423002554.GB25327@quack.suse.cz>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="LQksG6bCIzRHxTLp"
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Commit Message

Jan Kara April 23, 2010, 12:25 a.m. UTC

Hi,

  while reading through the checkpointing code I've realized that we
actually have to send a barrier before each update of journal superblock
after checkpointing. Attached patch does this. Just I'm not sure whether
the performance cost won't be too big. In principle, we could make this
more lightweight by using the fact that transaction commit also sends the
barrier. So we could check before sending a barrier for transaction commit
whether we are slowly running out of journal space and if so whether some
transaction isn't already checkpointed. If yes, we can happily submit
update of journal superblock after the barrier. In case journal is decently
large this should solve the checkpointing problem without introducing
noticeable overhead...

								Honza

Comments

Jan Kara May 7, 2010, 3:53 p.m. UTC | #1

>   Hi,
> 
>   while reading through the checkpointing code I've realized that we
> actually have to send a barrier before each update of journal superblock
> after checkpointing. Attached patch does this. Just I'm not sure whether
> the performance cost won't be too big. In principle, we could make this
> more lightweight by using the fact that transaction commit also sends the
> barrier. So we could check before sending a barrier for transaction commit
> whether we are slowly running out of journal space and if so whether some
> transaction isn't already checkpointed. If yes, we can happily submit
> update of journal superblock after the barrier. In case journal is decently
> large this should solve the checkpointing problem without introducing
> noticeable overhead...
  Ping? Ted, any opinion?

								Honza

> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

> >From e749e83627c35c683114fea32695e581a487e560 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Fri, 23 Apr 2010 02:12:32 +0200
> Subject: [PATCH] ext4: Send barrier before updating journal superblock after checkpointing
> 
> We have to send a disk barrier after we have finished checkpointing and before
> we update the journal superblock and thus effectively remove transactions from
> the journal. Otherwise the write of journal superblock can be reordered before
> writes of checkpointed journal blocks and thus in case of crash these blocks
> needn't be on the platter leading to filesystem corruption.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/jbd2/checkpoint.c |   19 +++++++++----------
>  1 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 30beb11..b2de17f 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -519,17 +519,16 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
>  	spin_unlock(&journal->j_state_lock);
>  
>  	/*
> -	 * If there is an external journal, we need to make sure that
> -	 * any data blocks that were recently written out --- perhaps
> -	 * by jbd2_log_do_checkpoint() --- are flushed out before we
> -	 * drop the transactions from the external journal.  It's
> -	 * unlikely this will be necessary, especially with a
> -	 * appropriately sized journal, but we need this to guarantee
> -	 * correctness.  Fortunately jbd2_cleanup_journal_tail()
> -	 * doesn't get called all that often.
> +	 * We need to make sure that any data blocks that were recently written
> +	 * out --- perhaps by jbd2_log_do_checkpoint() --- are flushed out
> +	 * before we drop the transactions from the journal. Otherwise journal
> +	 * superblock write could be reordered before writeout of data and thus
> +	 * we could corrupt the filesystem in case of crash. It's unlikely this
> +	 * will be necessary, especially with a appropriately sized journal,
> +	 * but we need this to guarantee correctness.  Fortunately
> +	 * jbd2_cleanup_journal_tail() doesn't get called all that often.
>  	 */
> -	if ((journal->j_fs_dev != journal->j_dev) &&
> -	    (journal->j_flags & JBD2_BARRIER))
> +	if (journal->j_flags & JBD2_BARRIER)
>  		blkdev_issue_flush(journal->j_fs_dev, NULL);
>  	if (!(journal->j_flags & JBD2_ABORT))
>  		jbd2_journal_update_superblock(journal, 1);
> -- 
> 1.6.4.2
>

Jan Kara May 26, 2010, 1:58 p.m. UTC | #2

Hi Ted,

>   while reading through the checkpointing code I've realized that we
> actually have to send a barrier before each update of journal superblock
> after checkpointing. Attached patch does this. Just I'm not sure whether
> the performance cost won't be too big. In principle, we could make this
> more lightweight by using the fact that transaction commit also sends the
> barrier. So we could check before sending a barrier for transaction commit
> whether we are slowly running out of journal space and if so whether some
> transaction isn't already checkpointed. If yes, we can happily submit
> update of journal superblock after the barrier. In case journal is decently
> large this should solve the checkpointing problem without introducing
> noticeable overhead...
  Did you have a chance to look at this?

								Honza

> >From e749e83627c35c683114fea32695e581a487e560 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Fri, 23 Apr 2010 02:12:32 +0200
> Subject: [PATCH] ext4: Send barrier before updating journal superblock after checkpointing
> 
> We have to send a disk barrier after we have finished checkpointing and before
> we update the journal superblock and thus effectively remove transactions from
> the journal. Otherwise the write of journal superblock can be reordered before
> writes of checkpointed journal blocks and thus in case of crash these blocks
> needn't be on the platter leading to filesystem corruption.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/jbd2/checkpoint.c |   19 +++++++++----------
>  1 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 30beb11..b2de17f 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -519,17 +519,16 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
>  	spin_unlock(&journal->j_state_lock);
>  
>  	/*
> -	 * If there is an external journal, we need to make sure that
> -	 * any data blocks that were recently written out --- perhaps
> -	 * by jbd2_log_do_checkpoint() --- are flushed out before we
> -	 * drop the transactions from the external journal.  It's
> -	 * unlikely this will be necessary, especially with a
> -	 * appropriately sized journal, but we need this to guarantee
> -	 * correctness.  Fortunately jbd2_cleanup_journal_tail()
> -	 * doesn't get called all that often.
> +	 * We need to make sure that any data blocks that were recently written
> +	 * out --- perhaps by jbd2_log_do_checkpoint() --- are flushed out
> +	 * before we drop the transactions from the journal. Otherwise journal
> +	 * superblock write could be reordered before writeout of data and thus
> +	 * we could corrupt the filesystem in case of crash. It's unlikely this
> +	 * will be necessary, especially with a appropriately sized journal,
> +	 * but we need this to guarantee correctness.  Fortunately
> +	 * jbd2_cleanup_journal_tail() doesn't get called all that often.
>  	 */
> -	if ((journal->j_fs_dev != journal->j_dev) &&
> -	    (journal->j_flags & JBD2_BARRIER))
> +	if (journal->j_flags & JBD2_BARRIER)
>  		blkdev_issue_flush(journal->j_fs_dev, NULL);
>  	if (!(journal->j_flags & JBD2_ABORT))
>  		jbd2_journal_update_superblock(journal, 1);
> -- 
> 1.6.4.2
>

From e749e83627c35c683114fea32695e581a487e560 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 23 Apr 2010 02:12:32 +0200
Subject: [PATCH] ext4: Send barrier before updating journal superblock after checkpointing

We have to send a disk barrier after we have finished checkpointing and before
we update the journal superblock and thus effectively remove transactions from
the journal. Otherwise the write of journal superblock can be reordered before
writes of checkpointed journal blocks and thus in case of crash these blocks
needn't be on the platter leading to filesystem corruption.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/checkpoint.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 30beb11..b2de17f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -519,17 +519,16 @@  int jbd2_cleanup_journal_tail(journal_t *journal)
 	spin_unlock(&journal->j_state_lock);
 
 	/*
-	 * If there is an external journal, we need to make sure that
-	 * any data blocks that were recently written out --- perhaps
-	 * by jbd2_log_do_checkpoint() --- are flushed out before we
-	 * drop the transactions from the external journal.  It's
-	 * unlikely this will be necessary, especially with a
-	 * appropriately sized journal, but we need this to guarantee
-	 * correctness.  Fortunately jbd2_cleanup_journal_tail()
-	 * doesn't get called all that often.
+	 * We need to make sure that any data blocks that were recently written
+	 * out --- perhaps by jbd2_log_do_checkpoint() --- are flushed out
+	 * before we drop the transactions from the journal. Otherwise journal
+	 * superblock write could be reordered before writeout of data and thus
+	 * we could corrupt the filesystem in case of crash. It's unlikely this
+	 * will be necessary, especially with a appropriately sized journal,
+	 * but we need this to guarantee correctness.  Fortunately
+	 * jbd2_cleanup_journal_tail() doesn't get called all that often.
 	 */
-	if ((journal->j_fs_dev != journal->j_dev) &&
-	    (journal->j_flags & JBD2_BARRIER))
+	if (journal->j_flags & JBD2_BARRIER)
 		blkdev_issue_flush(journal->j_fs_dev, NULL);
 	if (!(journal->j_flags & JBD2_ABORT))
 		jbd2_journal_update_superblock(journal, 1);
-- 
1.6.4.2

[RFC] Journal superblock update should send a barrier

Commit Message

Comments

Patch