[SRU,F,0/3] bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled

Message ID	20201021015902.19223-1-matthew.ruffell@canonical.com
Headers	show Return-Path: <kernel-team-bounces@lists.ubuntu.com> From: Matthew Ruffell <matthew.ruffell@canonical.com> To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 0/3] bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled Date: Wed, 21 Oct 2020 14:58:59 +1300 Message-Id: <20201021015902.19223-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>
Series	bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled \| expand [SRU,F,0/3] bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled [SRU,F,1/3] bcache: remove member accessed from struct btree [SRU,F,2/3] bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() [SRU,F,3/3] bcache: reap from tail of c->btree_cache in bch_mca_scan()

Message ID

20201021015902.19223-1-matthew.ruffell@canonical.com

Headers

From: Matthew Ruffell <matthew.ruffell@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [SRU][F][PATCH 0/3] bcache: Issues with large IO wait in
 bch_mca_scan() when shrinker is enabled
Date: Wed, 21 Oct 2020 14:58:59 +1300
Message-Id: <20201021015902.19223-1-matthew.ruffell@canonical.com>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

Series

bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled | expand

Message

Matthew Ruffell Oct. 21, 2020, 1:58 a.m. UTC

BugLink: https://bugs.launchpad.net/bugs/1898786

[Impact]

Systems that utilise bcache can experience extremely high IO wait times when
under constant IO pressure. The IO wait times seem to stay at a consistent 1
second, and never drop as long as the bcache shrinker is enabled.

If you disable the shrinker, then IO wait drops significantly to normal levels.

We did some perf analysis, and it seems we spend a huge amount of time in
bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock".

Looking at the recent commits in Bionic, we found the following commit merged in
upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable:

commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b
Author: Coly Li <colyli@suse.de>
Date: Wed Nov 13 16:03:24 2019 +0800
Subject: bcache: at least try to shrink 1 node in bch_mca_scan()
Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b 

It mentions in the description that:

> If sc->nr_to_scan is smaller than c->btree_pages, after the above
> calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
> frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
> nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
> and releasing mutex c->bucket_lock. 

This seems to be what is going on here, but the above commit only addresses when
nr is 0.

From what I can see, the problems we are experiencing are when nr is 1 or 2, and
again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to
release c->bucket_lock since the shrinker loop never executes since there is no
work to do.

[Fix]

The following commits fix the problem, and all landed in 5.6-rc1:

commit 125d98edd11464c8e0ec9eaaba7d682d0f832686
Author: Coly Li <colyli@suse.de>
Date: Fri Jan 24 01:01:40 2020 +0800
Subject: bcache: remove member accessed from struct btree
Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686

commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
Author: Coly Li <colyli@suse.de>
Date: Fri Jan 24 01:01:41 2020 +0800
Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795

commit e3de04469a49ee09c89e80bf821508df458ccee6
Author: Coly Li <colyli@suse.de>
Date: Fri Jan 24 01:01:42 2020 +0800
Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan()
Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 

The first commit is a dependency of the other two. The first commit removes a
"recently accessed" marker, used to indicate if a particular cache has been used
recently, and if it has been, not consider it for cache eviction. The commit
mentions that under heavy IO, all caches will end up being recently accessed,
and nothing will ever be shrunk.

The second commit changes a previous design decision of skipping the first
3 caches to shrink, since it is a common case to call bch_mca_scan() with nr
being 1, or 2, just like 0 was common in the very first commit I mentioned.
This time, if we land on 1 or 2, the loop exits and nothing happens, and we
waste time waiting on locks, just like the very first commit I mentioned.
The fix is to try shrink caches from the tail of the list, and not the beginning.

The third commit fixes a minor issue where we don't want to re-arrange the linked
list c->btree_cache, which is what the second commit ended up doing, and instead, 
just shrink the cache at the end of the linked list, and not change the order.

One minor backport / context adjustment was required in the first commit for
Bionic, and the rest are all clean cherry picks to Bionic and Focal.

[Testcase]

This is kind of hard to test, since the problem shows up in production
environments that are under constant IO pressure, with many different items 
entering and leaving the cache.

The Launchpad git server is currently suffering this issue, and has been sitting
at a constant IO wait of 1 second / slightly over 1 second which was causing
slow response times, which was leading to build failures when git clones ended
up timing out.

We installed a test kernel, which is available in the following PPA:

https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test 

Once the test kernel had been installed, IO wait times with the shrinker enabled
dropped to normal levels, and the git server became responsive again. We have
been monitoring the performance of the git server and watching IO wait times
in grafana over the past week, and everything is looking good, and indicate that
these patches solve the issue.

[Regression Potential]

If a regression were to occur, it would only affect users of bcache who have the
shrinker enabled, which I believe is the default setting in bcache deployments.

In that case, users could disable the shrinker as a workaround until a fix is
available. You can disable the shrinker with: 
"sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled""

The patches remove the "recently accessed" marker, which changes the behaviour
slightly when it comes to deciding what to evict from the cache. Some workloads
that aren't under IO pressure may see a difference in what items are evicted
from the cache, which may slightly change bcache's performance profile. 

Since the patches prune the cache from the tail of the cache linked list, we
will likely end up pruning more nodes than previously, since it was common to
land on the first 3 caches, which were hard coded to be ignored and not pruned,
which may also slightly change the bcache's performance profile.

Coly Li (3):
  bcache: remove member accessed from struct btree
  bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
  bcache: reap from tail of c->btree_cache in bch_mca_scan()

 drivers/md/bcache/btree.c | 24 ++++++++++--------------
 drivers/md/bcache/btree.h |  2 --
 2 files changed, 10 insertions(+), 16 deletions(-)

Comments

Stefan Bader Oct. 21, 2020, 7:54 a.m. UTC | #1

On 21.10.20 03:58, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1898786
> 
> [Impact]
> 
> Systems that utilise bcache can experience extremely high IO wait times when
> under constant IO pressure. The IO wait times seem to stay at a consistent 1
> second, and never drop as long as the bcache shrinker is enabled.
> 
> If you disable the shrinker, then IO wait drops significantly to normal levels.
> 
> We did some perf analysis, and it seems we spend a huge amount of time in
> bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock".
> 
> Looking at the recent commits in Bionic, we found the following commit merged in
> upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable:
> 
> commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b
> Author: Coly Li <colyli@suse.de>
> Date: Wed Nov 13 16:03:24 2019 +0800
> Subject: bcache: at least try to shrink 1 node in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b 
> 
> It mentions in the description that:
> 
>> If sc->nr_to_scan is smaller than c->btree_pages, after the above
>> calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
>> frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
>> nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
>> and releasing mutex c->bucket_lock. 
> 
> This seems to be what is going on here, but the above commit only addresses when
> nr is 0.
> 
> From what I can see, the problems we are experiencing are when nr is 1 or 2, and
> again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to
> release c->bucket_lock since the shrinker loop never executes since there is no
> work to do.
> 
> [Fix]
> 
> The following commits fix the problem, and all landed in 5.6-rc1:
> 
> commit 125d98edd11464c8e0ec9eaaba7d682d0f832686
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:40 2020 +0800
> Subject: bcache: remove member accessed from struct btree
> Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686
> 
> commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:41 2020 +0800
> Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> 
> commit e3de04469a49ee09c89e80bf821508df458ccee6
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:42 2020 +0800
> Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 
> 
> The first commit is a dependency of the other two. The first commit removes a
> "recently accessed" marker, used to indicate if a particular cache has been used
> recently, and if it has been, not consider it for cache eviction. The commit
> mentions that under heavy IO, all caches will end up being recently accessed,
> and nothing will ever be shrunk.
> 
> The second commit changes a previous design decision of skipping the first
> 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr
> being 1, or 2, just like 0 was common in the very first commit I mentioned.
> This time, if we land on 1 or 2, the loop exits and nothing happens, and we
> waste time waiting on locks, just like the very first commit I mentioned.
> The fix is to try shrink caches from the tail of the list, and not the beginning.
> 
> The third commit fixes a minor issue where we don't want to re-arrange the linked
> list c->btree_cache, which is what the second commit ended up doing, and instead, 
> just shrink the cache at the end of the linked list, and not change the order.
> 
> One minor backport / context adjustment was required in the first commit for
> Bionic, and the rest are all clean cherry picks to Bionic and Focal.
> 
> [Testcase]
> 
> This is kind of hard to test, since the problem shows up in production
> environments that are under constant IO pressure, with many different items 
> entering and leaving the cache.
> 
> The Launchpad git server is currently suffering this issue, and has been sitting
> at a constant IO wait of 1 second / slightly over 1 second which was causing
> slow response times, which was leading to build failures when git clones ended
> up timing out.
> 
> We installed a test kernel, which is available in the following PPA:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test 
> 
> Once the test kernel had been installed, IO wait times with the shrinker enabled
> dropped to normal levels, and the git server became responsive again. We have
> been monitoring the performance of the git server and watching IO wait times
> in grafana over the past week, and everything is looking good, and indicate that
> these patches solve the issue.
> 
> [Regression Potential]
> 
> If a regression were to occur, it would only affect users of bcache who have the
> shrinker enabled, which I believe is the default setting in bcache deployments.
> 
> In that case, users could disable the shrinker as a workaround until a fix is
> available. You can disable the shrinker with: 
> "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled""
> 
> The patches remove the "recently accessed" marker, which changes the behaviour
> slightly when it comes to deciding what to evict from the cache. Some workloads
> that aren't under IO pressure may see a difference in what items are evicted
> from the cache, which may slightly change bcache's performance profile. 
> 
> Since the patches prune the cache from the tail of the cache linked list, we
> will likely end up pruning more nodes than previously, since it was common to
> land on the first 3 caches, which were hard coded to be ignored and not pruned,
> which may also slightly change the bcache's performance profile.
> 
> Coly Li (3):
>   bcache: remove member accessed from struct btree
>   bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
>   bcache: reap from tail of c->btree_cache in bch_mca_scan()
> 
>  drivers/md/bcache/btree.c | 24 ++++++++++--------------
>  drivers/md/bcache/btree.h |  2 --
>  2 files changed, 10 insertions(+), 16 deletions(-)
> 

Testing looks good and cherry picks/backports. Formally, for future submissions,
I would suggest to always group multi-series submissions together. In this case
this would have made the list two patches shorter and as a reviewer one does not
need to look twice. Things can be prepared with a bit of creative file naming
and adjusted subjects. For the subjects:

[PATCH 1/3 B] ...
[PATCH 1/3 F] ...
[PATCH 2/3 B/F] ...
[PATCH 3/3 B/F] ...

But this time,

Acked-by: Stefan Bader <stefan.bader@canonical.com>

Colin Ian King Oct. 21, 2020, 8:33 a.m. UTC | #2

On 21/10/2020 02:58, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1898786
> 
> [Impact]
> 
> Systems that utilise bcache can experience extremely high IO wait times when
> under constant IO pressure. The IO wait times seem to stay at a consistent 1
> second, and never drop as long as the bcache shrinker is enabled.
> 
> If you disable the shrinker, then IO wait drops significantly to normal levels.
> 
> We did some perf analysis, and it seems we spend a huge amount of time in
> bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock".
> 
> Looking at the recent commits in Bionic, we found the following commit merged in
> upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable:
> 
> commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b
> Author: Coly Li <colyli@suse.de>
> Date: Wed Nov 13 16:03:24 2019 +0800
> Subject: bcache: at least try to shrink 1 node in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b 
> 
> It mentions in the description that:
> 
>> If sc->nr_to_scan is smaller than c->btree_pages, after the above
>> calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
>> frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
>> nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
>> and releasing mutex c->bucket_lock. 
> 
> This seems to be what is going on here, but the above commit only addresses when
> nr is 0.
> 
> From what I can see, the problems we are experiencing are when nr is 1 or 2, and
> again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to
> release c->bucket_lock since the shrinker loop never executes since there is no
> work to do.
> 
> [Fix]
> 
> The following commits fix the problem, and all landed in 5.6-rc1:
> 
> commit 125d98edd11464c8e0ec9eaaba7d682d0f832686
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:40 2020 +0800
> Subject: bcache: remove member accessed from struct btree
> Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686
> 
> commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:41 2020 +0800
> Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> 
> commit e3de04469a49ee09c89e80bf821508df458ccee6
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:42 2020 +0800
> Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 
> 
> The first commit is a dependency of the other two. The first commit removes a
> "recently accessed" marker, used to indicate if a particular cache has been used
> recently, and if it has been, not consider it for cache eviction. The commit
> mentions that under heavy IO, all caches will end up being recently accessed,
> and nothing will ever be shrunk.
> 
> The second commit changes a previous design decision of skipping the first
> 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr
> being 1, or 2, just like 0 was common in the very first commit I mentioned.
> This time, if we land on 1 or 2, the loop exits and nothing happens, and we
> waste time waiting on locks, just like the very first commit I mentioned.
> The fix is to try shrink caches from the tail of the list, and not the beginning.
> 
> The third commit fixes a minor issue where we don't want to re-arrange the linked
> list c->btree_cache, which is what the second commit ended up doing, and instead, 
> just shrink the cache at the end of the linked list, and not change the order.
> 
> One minor backport / context adjustment was required in the first commit for
> Bionic, and the rest are all clean cherry picks to Bionic and Focal.
> 
> [Testcase]
> 
> This is kind of hard to test, since the problem shows up in production
> environments that are under constant IO pressure, with many different items 
> entering and leaving the cache.
> 
> The Launchpad git server is currently suffering this issue, and has been sitting
> at a constant IO wait of 1 second / slightly over 1 second which was causing
> slow response times, which was leading to build failures when git clones ended
> up timing out.
> 
> We installed a test kernel, which is available in the following PPA:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test 
> 
> Once the test kernel had been installed, IO wait times with the shrinker enabled
> dropped to normal levels, and the git server became responsive again. We have
> been monitoring the performance of the git server and watching IO wait times
> in grafana over the past week, and everything is looking good, and indicate that
> these patches solve the issue.
> 
> [Regression Potential]
> 
> If a regression were to occur, it would only affect users of bcache who have the
> shrinker enabled, which I believe is the default setting in bcache deployments.
> 
> In that case, users could disable the shrinker as a workaround until a fix is
> available. You can disable the shrinker with: 
> "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled""
> 
> The patches remove the "recently accessed" marker, which changes the behaviour
> slightly when it comes to deciding what to evict from the cache. Some workloads
> that aren't under IO pressure may see a difference in what items are evicted
> from the cache, which may slightly change bcache's performance profile. 
> 
> Since the patches prune the cache from the tail of the cache linked list, we
> will likely end up pruning more nodes than previously, since it was common to
> land on the first 3 caches, which were hard coded to be ignored and not pruned,
> which may also slightly change the bcache's performance profile.
> 
> Coly Li (3):
>   bcache: remove member accessed from struct btree
>   bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
>   bcache: reap from tail of c->btree_cache in bch_mca_scan()
> 
>  drivers/md/bcache/btree.c | 24 ++++++++++--------------
>  drivers/md/bcache/btree.h |  2 --
>  2 files changed, 10 insertions(+), 16 deletions(-)
> 
Acked-by: Colin Ian King <colin.king@canonical.com>

Ian May Oct. 22, 2020, 6:25 p.m. UTC | #3

Applied to Focal/master-next

Thanks,
Ian

On 2020-10-21 14:58:59 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1898786
> 
> [Impact]
> 
> Systems that utilise bcache can experience extremely high IO wait times when
> under constant IO pressure. The IO wait times seem to stay at a consistent 1
> second, and never drop as long as the bcache shrinker is enabled.
> 
> If you disable the shrinker, then IO wait drops significantly to normal levels.
> 
> We did some perf analysis, and it seems we spend a huge amount of time in
> bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock".
> 
> Looking at the recent commits in Bionic, we found the following commit merged in
> upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable:
> 
> commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b
> Author: Coly Li <colyli@suse.de>
> Date: Wed Nov 13 16:03:24 2019 +0800
> Subject: bcache: at least try to shrink 1 node in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b 
> 
> It mentions in the description that:
> 
> > If sc->nr_to_scan is smaller than c->btree_pages, after the above
> > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
> > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
> > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
> > and releasing mutex c->bucket_lock. 
> 
> This seems to be what is going on here, but the above commit only addresses when
> nr is 0.
> 
> From what I can see, the problems we are experiencing are when nr is 1 or 2, and
> again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to
> release c->bucket_lock since the shrinker loop never executes since there is no
> work to do.
> 
> [Fix]
> 
> The following commits fix the problem, and all landed in 5.6-rc1:
> 
> commit 125d98edd11464c8e0ec9eaaba7d682d0f832686
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:40 2020 +0800
> Subject: bcache: remove member accessed from struct btree
> Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686
> 
> commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:41 2020 +0800
> Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
> 
> commit e3de04469a49ee09c89e80bf821508df458ccee6
> Author: Coly Li <colyli@suse.de>
> Date: Fri Jan 24 01:01:42 2020 +0800
> Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan()
> Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 
> 
> The first commit is a dependency of the other two. The first commit removes a
> "recently accessed" marker, used to indicate if a particular cache has been used
> recently, and if it has been, not consider it for cache eviction. The commit
> mentions that under heavy IO, all caches will end up being recently accessed,
> and nothing will ever be shrunk.
> 
> The second commit changes a previous design decision of skipping the first
> 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr
> being 1, or 2, just like 0 was common in the very first commit I mentioned.
> This time, if we land on 1 or 2, the loop exits and nothing happens, and we
> waste time waiting on locks, just like the very first commit I mentioned.
> The fix is to try shrink caches from the tail of the list, and not the beginning.
> 
> The third commit fixes a minor issue where we don't want to re-arrange the linked
> list c->btree_cache, which is what the second commit ended up doing, and instead, 
> just shrink the cache at the end of the linked list, and not change the order.
> 
> One minor backport / context adjustment was required in the first commit for
> Bionic, and the rest are all clean cherry picks to Bionic and Focal.
> 
> [Testcase]
> 
> This is kind of hard to test, since the problem shows up in production
> environments that are under constant IO pressure, with many different items 
> entering and leaving the cache.
> 
> The Launchpad git server is currently suffering this issue, and has been sitting
> at a constant IO wait of 1 second / slightly over 1 second which was causing
> slow response times, which was leading to build failures when git clones ended
> up timing out.
> 
> We installed a test kernel, which is available in the following PPA:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test 
> 
> Once the test kernel had been installed, IO wait times with the shrinker enabled
> dropped to normal levels, and the git server became responsive again. We have
> been monitoring the performance of the git server and watching IO wait times
> in grafana over the past week, and everything is looking good, and indicate that
> these patches solve the issue.
> 
> [Regression Potential]
> 
> If a regression were to occur, it would only affect users of bcache who have the
> shrinker enabled, which I believe is the default setting in bcache deployments.
> 
> In that case, users could disable the shrinker as a workaround until a fix is
> available. You can disable the shrinker with: 
> "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled""
> 
> The patches remove the "recently accessed" marker, which changes the behaviour
> slightly when it comes to deciding what to evict from the cache. Some workloads
> that aren't under IO pressure may see a difference in what items are evicted
> from the cache, which may slightly change bcache's performance profile. 
> 
> Since the patches prune the cache from the tail of the cache linked list, we
> will likely end up pruning more nodes than previously, since it was common to
> land on the first 3 caches, which were hard coded to be ignored and not pruned,
> which may also slightly change the bcache's performance profile.
> 
> Coly Li (3):
>   bcache: remove member accessed from struct btree
>   bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
>   bcache: reap from tail of c->btree_cache in bch_mca_scan()
> 
>  drivers/md/bcache/btree.c | 24 ++++++++++--------------
>  drivers/md/bcache/btree.h |  2 --
>  2 files changed, 10 insertions(+), 16 deletions(-)
> 
> -- 
> 2.27.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team