Message ID | 7398CEE9-AF68-4A2A-82E4-940FADF81F97@gmail.com |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 25, 2013 at 09:37:07PM +0300, Alexey Lyahkov wrote: > Mel, > > > On Apr 25, 2013, at 17:30, Mel Gorman wrote: > > > On Wed, Apr 24, 2013 at 10:26:50AM -0400, Theodore Ts'o wrote: > >> On Tue, Apr 23, 2013 at 03:00:08PM -0700, Andrew Morton wrote: > >>> That should fix things for now. Although it might be better to just do > >>> > >>> mark_page_accessed(page); /* to SetPageReferenced */ > >>> lru_add_drain(); /* to SetPageLRU */ > >>> > >>> Because a) this was too early to decide that the page is > >>> super-important and b) the second touch of this page should have a > >>> mark_page_accessed() in it already. > >> > >> The question is do we really want to put lru_add_drain() into the ext4 > >> file system code? That seems to pushing some fairly mm-specific > >> knowledge into file system code. I'll do this if I have to do, but > >> wouldn't be better if this was pushed into mark_page_accessed(), or > >> some other new API was exported by the mm subsystem? > >> > > > > I don't think we want to push lru_add_drain() into the ext4 code. It's > > too specific of knowledge just to work around pagevecs. Before we rework > > how pagevecs select what LRU to place a page, can we make sure that fixing > > that will fix the problem? > > > what is "that"? puting lru_add_drain() in ext4 core? sure that is fixes problem with many small reads during large write. > originally i have put shake_page() in ext4 code, but that have call lru_add_drain_all() so to exaggerated. > No, I would prefer if this was not fixed within ext4. I need confirmation that fixing mark_page_accessed() addresses the performance problem you encounter. The two-line check for PageLRU() followed by a lru_add_drain() is meant to check that. That is still not my preferred fix because even if you do not encounter higher LRU contention, other workloads would be at risk. The likely fix will involve converting pagevecs to using a single list and then selecting what LRU to put a page on at drain time but I want to know that it's worthwhile. Using shake_page() in ext4 is certainly overkill. > > Andrew, can you try the following patch please? Also, is there any chance > > you can describe in more detail what the workload does? > > lustre OSS node + IOR with file size twice more then OSS memory. > Ok, no way I'll be reproducing that workload. Thanks.
On Apr 26, 2013, at 01:40, Mel Gorman wrote: > No, I would prefer if this was not fixed within ext4. I need confirmation > that fixing mark_page_accessed() addresses the performance problem you > encounter. The two-line check for PageLRU() followed by a lru_add_drain() > is meant to check that. That is still not my preferred fix because even > if you do not encounter higher LRU contention, other workloads would be > at risk. The likely fix will involve converting pagevecs to using a single > list and then selecting what LRU to put a page on at drain time but I > want to know that it's worthwhile. > > Using shake_page() in ext4 is certainly overkill. agree, but it's was my prof of concept patch :) just to verify founded > >>> Andrew, can you try the following patch please? Also, is there any chance >>> you can describe in more detail what the workload does? >> >> lustre OSS node + IOR with file size twice more then OSS memory. >> > > Ok, no way I'll be reproducing that workload. Thanks. > I think you should be try several processes with DIO (so don't put any pages in lru_pagevec as that is heap), each have a filesize twice or more of available memory. Main idea you should be have a read a new pages in budy cache (to allocate) and have large memory allocation in same time. DIO chunk should be enough to start streaming allocation. also you may use attached jprobe module to hit an BUG() if buddy page removed from a memory by shrinker.
Index: linux-stage/fs/ext4/mballoc.c =================================================================== --- linux-stage.orig/fs/ext4/mballoc.c 2013-03-19 10:55:52.000000000 +0200 +++ linux-stage/fs/ext4/mballoc.c 2013-03-19 10:59:02.000000000 +0200 @@ -900,8 +900,11 @@ static int ext4_mb_init_cache(struct pag incore = data; } } - if (likely(err == 0)) + if (likely(err == 0)) { SetPageUptodate(page); + /* make sure it's in active list */ + mark_page_accessed(page); + } out: if (bh) { @@ -957,6 +960,8 @@ int ext4_mb_init_group(struct super_bloc page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); if (page) { BUG_ON(page->mapping != inode->i_mapping); + /* move to lru - should be lru_add_drain() */ + shake_page(page, 0); ret = ext4_mb_init_cache(page, NULL); if (ret) { unlock_page(page); @@ -986,6 +991,8 @@ int ext4_mb_init_group(struct super_bloc unlock_page(page); } else if (page) { BUG_ON(page->mapping != inode->i_mapping); + /* move to lru - should be lru_add_drain() */ + shake_page(page, 0); ret = ext4_mb_init_cache(page, bitmap); if (ret) { unlock_page(page); @@ -1087,6 +1094,7 @@ repeat_load_buddy: if (page) { BUG_ON(page->mapping != inode->i_mapping); if (!PageUptodate(page)) { + shake_page(page, 0); ret = ext4_mb_init_cache(page, NULL); if (ret) { unlock_page(page); @@ -1118,6 +1126,7 @@ repeat_load_buddy: if (page) { BUG_ON(page->mapping != inode->i_mapping); if (!PageUptodate(page)) { + shake_page(page, 0); ret = ext4_mb_init_cache(page, e4b->bd_bitmap); if (ret) { unlock_page(page); @@ -2500,6 +2509,8 @@ static int ext4_mb_init_backend(struct s * not in the inode hash, so it should never be found by iget(), but * this will avoid confusion if it ever shows up during debugging. */ sbi->s_buddy_cache->i_ino = EXT4_BAD_INO; + sbi->s_buddy_cache->i_state = I_NEW; +// mapping_set_unevictable(sbi->s_buddy_cache->i_mapping); EXT4_I(sbi->s_buddy_cache)->i_disksize = 0; for (i = 0; i < ngroups; i++) { desc = ext4_get_group_desc(sb, i, NULL);