Message ID | 20081006101605.GA15881@skywalker |
---|---|
State | Superseded, archived |
Headers | show |
On Mon, 2008-10-06 at 15:46 +0530, Aneesh Kumar K.V wrote: > On Fri, Oct 03, 2008 at 03:45:55PM -0400, Chris Mason wrote: > > On Fri, 2008-10-03 at 09:43 +1000, Dave Chinner wrote: > > > On Thu, Oct 02, 2008 at 11:48:56PM +0530, Aneesh Kumar K.V wrote: > > > > On Thu, Oct 02, 2008 at 08:20:54AM -0400, Chris Mason wrote: > > > > > On Wed, 2008-10-01 at 21:52 -0700, Andrew Morton wrote: > > > > > For a 4.5GB streaming buffered write, this printk inside > > > > > ext4_da_writepage shows up 37,2429 times in /var/log/messages. > > > > > > > > > > > > > Part of that can happen due to shrink_page_list -> pageout -> writepagee > > > > call back with lots of unallocated buffer_heads(blocks). > > > > > > Quite frankly, a simple streaming buffered write should *never* > > > trigger writeback from the LRU in memory reclaim. > > > > The blktrace runs on ext4 didn't show kswapd doing any IO. It isn't > > clear if this is because ext4 did the redirty trick or if kswapd didn't > > call writepage. > > > > -chris > > This patch actually reduced the number of extents for the below test > from 564 to 171. > For my array, this patch brings the number of ext4 extents down from over 4000 to 27. The throughput reported by dd goes up from ~80MB/s to 330MB/s, which means buffered IO is going as fast as O_DIRECT. Here's the graph: http://oss.oracle.com/~mason/bugs/writeback_ordering/ext4-aneesh.png The strange metadata writeback for the uninit block groups is gone. Looking at the patch, I think the ext4_writepages code should just make its own write_cache_pages. It's pretty hard to follow the code that is there for ext4 vs the code that is there to make write_cache_pages do what ext4 expects it to. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 21f1d3a..58d010d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2386,14 +2386,16 @@ static int ext4_da_writepages(struct address_space *mapping, wbc->nr_to_write = sbi->s_mb_stream_request; } - if (!wbc->range_cyclic) + if (wbc->range_cyclic) { + range_start = mapping->writeback_index; + } else { /* * If range_cyclic is not set force range_cont - * and save the old writeback_index + * and save the old range_start; */ wbc->range_cont = 1; - - range_start = wbc->range_start; + range_start = wbc->range_start; + } pages_skipped = wbc->pages_skipped; mpd.wbc = wbc; @@ -2440,6 +2442,19 @@ static int ext4_da_writepages(struct address_space *mapping, */ to_write += wbc->nr_to_write; ret = 0; + if (pages_skipped != wbc->pages_skipped) { + /* writepages skipped some pages */ + if (wbc->range_cont) { + wbc->range_start = range_start; + } else { + /* range_cyclic */ + mapping->writeback_index = range_start; + } + wbc->nr_to_write = to_write + + (wbc->pages_skipped - pages_skipped); + wbc->pages_skipped = pages_skipped; + } else + wbc->nr_to_write = to_write; } else if (wbc->nr_to_write) { /* * There is no more writeout needed @@ -2449,18 +2464,27 @@ static int ext4_da_writepages(struct address_space *mapping, to_write += wbc->nr_to_write; break; } - wbc->nr_to_write = to_write; } if (wbc->range_cont && (pages_skipped != wbc->pages_skipped)) { /* We skipped pages in this loop */ wbc->range_start = range_start; wbc->nr_to_write = to_write + - wbc->pages_skipped - pages_skipped; + (wbc->pages_skipped - pages_skipped); wbc->pages_skipped = pages_skipped; goto restart_loop; } + if (wbc->range_cyclic && (pages_skipped != wbc->pages_skipped)) { + /* + * we need to make sure we don't move the + * writeback_index further without looking + * at the pages skipped. + */ + mapping->writeback_index = mapping->writeback_index - + (wbc->pages_skipped - pages_skipped); + } + out_writepages: wbc->nr_to_write = to_write - nr_to_writebump; wbc->range_start = range_start;