Improve buffered streaming write ordering

On Mon, Oct 06, 2008 at 10:21:43AM -0400, Chris Mason wrote:
> On Mon, 2008-10-06 at 15:46 +0530, Aneesh Kumar K.V wrote:
> > On Fri, Oct 03, 2008 at 03:45:55PM -0400, Chris Mason wrote:
> > > On Fri, 2008-10-03 at 09:43 +1000, Dave Chinner wrote:
> > > > On Thu, Oct 02, 2008 at 11:48:56PM +0530, Aneesh Kumar K.V wrote:
> > > > > On Thu, Oct 02, 2008 at 08:20:54AM -0400, Chris Mason wrote:
> > > > > > On Wed, 2008-10-01 at 21:52 -0700, Andrew Morton wrote:
> > > > > > For a 4.5GB streaming buffered write, this printk inside
> > > > > > ext4_da_writepage shows up 37,2429 times in /var/log/messages.
> > > > > > 
> > > > > 
> > > > > Part of that can happen due to shrink_page_list -> pageout -> writepagee
> > > > > call back with lots of unallocated buffer_heads(blocks).
> > > > 
> > > > Quite frankly, a simple streaming buffered write should *never*
> > > > trigger writeback from the LRU in memory reclaim.
> > > 
> > > The blktrace runs on ext4 didn't show kswapd doing any IO.  It isn't
> > > clear if this is because ext4 did the redirty trick or if kswapd didn't
> > > call writepage.
> > > 
> > > -chris
> > 
> > This patch actually reduced the number of extents for the below test
> > from 564 to 171.
> > 
> 
> For my array, this patch brings the number of ext4 extents down from
> over 4000 to 27.  The throughput reported by dd goes up from ~80MB/s to
> 330MB/s, which means buffered IO is going as fast as O_DIRECT.
> 
> Here's the graph:
> 
> http://oss.oracle.com/~mason/bugs/writeback_ordering/ext4-aneesh.png
> 
> The strange metadata writeback for the uninit block groups is gone.
> 
> Looking at the patch, I think the ext4_writepages code should just make
> its own write_cache_pages.  It's pretty hard to follow the code that is
> there for ext4 vs the code that is there to make write_cache_pages do
> what ext4 expects it to.
> 

How about the below ? The patch is on top of ext4 patchqueue.

commit 9b839065f973eb68e8e28be18e1f31d8fb6854ee
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Tue Oct 7 12:27:52 2008 +0530

    ext4: Fix file fragmentation during large file write.

    The range_cyclic writeback mode use the address_space
    writeback_index as the start index for writeback. With
    delayed allocation we were updating writeback_index
    wrongly resulting in highly fragmented file. Number of
    extents reduced from 4000 to 27 for a 3GB file with
    the below patch.

    The patch also cleanup the ext4 delayed allocation writepages
    by implementing write_cache_pages locally with needed changes
    for cleanup. Also it drops the range_cont writeback mode added
    for ext4 delayed allocation writeback

    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Improve buffered streaming write ordering

Commit Message

Comments

Patch