Message ID | 1407382553-24256-1-git-send-email-wenqing.lz@taobao.com |
---|---|
State | Superseded, archived |
Headers | show |
Hi Zheng, It's been a while since you've posted a revised version of this patch series. I believe Jan had suggested some changes which you said you would fix in the next set of patches. Do you know when you might be able that might be done? Note that the first two patches in these series are already the patches that have been queued for Linus for this merge window. Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon 20-10-14 10:48:49, Ted Tso wrote: > It's been a while since you've posted a revised version of this patch > series. I believe Jan had suggested some changes which you said you > would fix in the next set of patches. Do you know when you might be > able that might be done? Yeah, actually if you don't have time just say so. I can push it to completion somehow (there wasn't that much work left). I need the thing completed reasonably quickly since I already have some user reports for openSUSE and I'm somewhat afraid reports for SLES will come pretty soon and I cannot leave those unresolved for long... Honza
Hi Ted, Jan and others, I deeply sorry for this because of my delay work. I don’t have any objection for Jan’s suggestions. Until now there are still some works that push me tough, and I can see that I don’t have time to finish it at this merge window. It’s a shame for me! Jan, I really really appreciate if you are willing to push this patch set to completion. Thanks!!! Regards, - Zheng > On Oct 21, 2014, at 6:22 PM, Jan Kara <jack@suse.cz> wrote: > > Hello, > > On Mon 20-10-14 10:48:49, Ted Tso wrote: >> It's been a while since you've posted a revised version of this patch >> series. I believe Jan had suggested some changes which you said you >> would fix in the next set of patches. Do you know when you might be >> able that might be done? > Yeah, actually if you don't have time just say so. I can push it to > completion somehow (there wasn't that much work left). I need the thing > completed reasonably quickly since I already have some user reports for > openSUSE and I'm somewhat afraid reports for SLES will come pretty soon and > I cannot leave those unresolved for long... > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Tue 21-10-14 23:58:10, 刘峥(文卿) wrote: > I deeply sorry for this because of my delay work. I don’t have any objection > for Jan’s suggestions. Until now there are still some works that push me > tough, and I can see that I don’t have time to finish it at this merge > window. It’s a shame for me! > > Jan, I really really appreciate if you are willing to push this patch set > to completion. Thanks!!! OK, I have updated the patches according to the review I and Ted did. It survives basic fsstress run. How were you testing your patches? I should probably also gather some statistics etc... Honza > > On Oct 21, 2014, at 6:22 PM, Jan Kara <jack@suse.cz> wrote: > > > > Hello, > > > > On Mon 20-10-14 10:48:49, Ted Tso wrote: > >> It's been a while since you've posted a revised version of this patch > >> series. I believe Jan had suggested some changes which you said you > >> would fix in the next set of patches. Do you know when you might be > >> able that might be done? > > Yeah, actually if you don't have time just say so. I can push it to > > completion somehow (there wasn't that much work left). I need the thing > > completed reasonably quickly since I already have some user reports for > > openSUSE and I'm somewhat afraid reports for SLES will come pretty soon and > > I cannot leave those unresolved for long... > > > > Honza > > -- > > Jan Kara <jack@suse.cz> > > SUSE Labs, CR >
On Mon, Nov 03, 2014 at 05:10:46PM +0100, Jan Kara wrote: > Hello, > > On Tue 21-10-14 23:58:10, 刘峥(文卿) wrote: > > I deeply sorry for this because of my delay work. I don’t have any objection > > for Jan’s suggestions. Until now there are still some works that push me > > tough, and I can see that I don’t have time to finish it at this merge > > window. It’s a shame for me! > > > > Jan, I really really appreciate if you are willing to push this patch set > > to completion. Thanks!!! > OK, I have updated the patches according to the review I and Ted did. It > survives basic fsstress run. How were you testing your patches? I should > probably also gather some statistics etc... Hi Jan, I was wondering how your updated extent status tree shrinker patches are going? Are you comfortable sending them out for review/merging? Sorry for nagging but we're already at 3.18-rc4, and it would be great if we could get these merged for this merge window. Thanks! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
===== Here is the third version to improve the extent status tree shrinker. Sorry for very late work. Thanks Ted's and Dave's suggestions. I revise my work and currently in extent status tree shrinker we should fix two issues. One is how to reclaim some objects from the tree as quick as possible, and another is how to keep useful extent cache in the tree as many as possible. The first issue also can be divided into two problems: a) how to scan list that tracks all inodes and b) how to reclaim object from an inode. This patch set tries to fix these issues. In the patch set, the first two patches just does some cleanups and adds some statistics for measuring the improvements. Patch 3 makes extent status tree cache extent hole in delalloc code path to improve the cache hit ratio. Currently we don't cache extent hole when we do a delalloc write because this extent hole might be converted into delayed soon. But the defect is that we could miss some extent hole in extent cache. That means that the following writes must lookup in extent tree again to make sure whether a block has been allocated or not. Patch 4 makes shrinker replace lru list with a normal list to track all inodes. Then the shrinker uses a round-robin algorithm to scan this list. The purpose that we discard the lru list and use rr algorithm is that we don't need to take a long time to keep lru list. Thus we can save some scan time. Meanwhile this commit tries to reduce the critical section. Thus the locking gets more fine-granularity. That means that other processes don't need to wait on this locking for a long time. The improvement can be seen in test case (b). Patch 5 uses a list to track all reclaimable objects in an inode to speed up the reclaim time. Now when shrinker tries to reclaim some objects it should scan rb-tree in an inode. But in this rb-tree, it has some non-reclaimable objects (delayed extent, ext4 uses them to finish seeking data/hole, finding delayed range, etc.). That means the shrinker must take some time to skip these non-reclaimble objects during the scan time. So after applying this patch the shrinker can directly reclaim objects. The improvement can be seen in test case (a). Patch 6 improve the list that tracks all reclaimable objects in order to promote the cache hit ratio. After applied patch 5, the extent cache could be flushed by a streaming workload because we don't any way to recognize which one should be kept as much as possible. In this commit it splits the list into active list and inactive list. Meanwhile a new flag called '_ACCESSED' is defined. When an extent cache is accessed, this flag will be markd. When the shrinker encounters this flag during scanning list, this extent cache will be moved to the tail of active list. All these active extent caches will be reclaimed when we are under high memory pressure and the shrinker doesn't reclaim any object in the first round. The improvement can be seen in test case (c). Environment =========== $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 4 CPU socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Stepping: 2 CPU MHz: 2400.000 BogoMIPS: 4799.89 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-3,8-11 NUMA node1 CPU(s): 4-7,12-15 $ cat /proc/meminfo MemTotal: 24677988 kB $ df -ah /dev/sdb1 453G 51G 380G 12% /mnt/sdb1 (HDD) Test Case ========= Test Case (a) ------------- The test case (a) is used to create a huge number of extent caches in 500 inodes. Running this test, we will get a very big rb-tree on every inodes. The purpose is to measure the latency when shrinker reclaim object from an inode. Fio script: [global] ioengine=psync bs=4k directory=/mnt/sdb1 group_reporting fallocate=0 direct=0 filesize=100000g size=600000g runtime=300 create_on_open=1 create_serialize=0 create_fsync=0 norandommap [io] rw=write numjobs=100 nrfiles=5 Test Case (b) ------------- The test case (b) is used to create a very long list in super block so that we can measure the latency when shrinker scan the list. Fio script: [global] ioengine=psync bs=4k directory=/mnt/sdb1 group_reporting fallocate=0 direct=0 runtime=300 create_on_open=1 create_serialize=0 create_fsync=0 norandommap [io] filesize=100000g size=600000g rw=write numjobs=100 nrfiles=5 openfiles=10000 [rand] filesize=1000g size=600000g rw=write:4k numjobs=20 nrfiles=20000 Test Case (c) ------------- The test case (c) is used to measure the cache hit/miss. *NOTE* I reduce the memory to 12g in order to make shrinker work more aggressive. Fio script: [global] ioengine=psync bs=4k directory=/mnt/sdb1 group_reporting direct=0 runtime=300 [read] rw=read numjobs=1 nrfiles=50 filesize=1g size=600000g [stream] filesize=1000g size=600000g rw=write numjobs=100 nrfiles=5 create_on_open=1 create_serialize=0 create_fsync=0 Note 1 ------ *For getting a very fragmented extent status tree, I use the following patch to disallow to merge extent cache* diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index b6c3366..f8c693e 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -351,6 +351,7 @@ static void ext4_es_free_extent(struct inode *inode, struct extent_status *es) static int ext4_es_can_be_merged(struct extent_status *es1, struct extent_status *es2) { +#if 0 if (ext4_es_status(es1) != ext4_es_status(es2)) return 0;