Message ID | 1275484147-26044-1-git-send-email-stefan.bader@canonical.com |
---|---|
State | Rejected |
Delegated to: | Stefan Bader |
Headers | show |
Ok, so upstream seems to have officially reverted the changes for now. Given that the changes in that area seem to have unexpected side effects and we do not really know the other patch would not do the same. When reverting the work-around we got huge sync times. IIRC the side effects of that patch were not as bad timewise (2 seconds?). So maybe for now, we stay with what we got and wait for the proper upstream solution? Stefan On 06/02/2010 03:09 PM, Stefan Bader wrote: > This is a followup on a previous post which basically consists of the > revert of 5e1941884c700b7b97bcad52c2d8212ac56a7ebc > "UBUNTU: SAUCE: sync before umount to reduce time taken by ext4 umount" > and then applying a patch from bugzilla > "writeback: fix __sync_filesystem(sb, 0) on umount" > > However the second patch never went upstream, while the following two > (backported to 2.6.32.y) made it. Sadly, when asking Jens Axboe about > forwarding those to stable, I got the following response: > > Jens Axboe wrote: >>> I would send the backports to stable but wanted to check with you >>> beforehand. Would you be ok with a stable submission of these two? > >> I would have said yes a few days ago, but Christoph claims that the >> patches make xfs test hang. Unfortunately I'm a bit between jobs and >> don't have proper testing equipment right now, so I had no other option >> than revert exactly those two commits... > > Now the question is whether we want to go ahead (because we need to > do the revert as that causes other problems) with the upstream version > that might cause xfs problems (assuming xfs is not our main fs) or > take the patch from bugzilla which might have the same problem or others > yet unknown. > > -Stefan > > ---- > > From: Dmitry Monakhov <dmonakhov@openvz.org> > > BugLink: http://launchpad.net/bugs/543617 > > __sync_filesystem(sb, 0) is no longer works on umount because sb can > not be pined, Because s_mount sem is downed for write and s_root is NULL. > In fact umount is a special case similar to WB_SYNC_ALL, where > sb is pinned already. > > BADCOMMIT: 03ba3782e8dcc5b0e1efe440d33084f066e38cae > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > (cherry-picked from https://bugzilla.kernel.org/attachment.cgi?id=26224) > > Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> > --- > fs/fs-writeback.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 4102f20..937a8ae 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -583,9 +583,9 @@ static int pin_sb_for_writeback(struct writeback_control *wbc, > unpin_sb_for_writeback(psb); > > /* > - * Caller must already hold the ref for this > + * Caller must already hold the ref for integrity sync, and on umount > */ > - if (wbc->sync_mode == WB_SYNC_ALL) { > + if (wbc->sync_mode == WB_SYNC_ALL || !sb->s_root) { > WARN_ON(!rwsem_is_locked(&sb->s_umount)); > return 0; > } > @@ -597,6 +597,7 @@ static int pin_sb_for_writeback(struct writeback_control *wbc, > spin_unlock(&sb_lock); > goto pinned; > } > + WARN_ON(1); > /* > * umounted, drop rwsem again and fall through to failure > */ > >
Hi Stefan, On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote: > Ok, so upstream seems to have officially reverted the changes for now. Given > that the changes in that area seem to have unexpected side effects and we do not > really know the other patch would not do the same. When reverting the > work-around we got huge sync times. IIRC the side effects of that patch were not > as bad timewise (2 seconds?). So maybe for now, we stay with what we got and > wait for the proper upstream solution? Upstream gave up on fixing the problem? Argh. Huge sync times in Lucid? When what was changed? I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even if there isn't a good fix for the umount slow-down since it solves the slow umount only under certain conditions, and makes other more common umount scenarios slower. -Kees
On 06/09/2010 09:06 PM, Kees Cook wrote: > Hi Stefan, > > On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote: >> Ok, so upstream seems to have officially reverted the changes for now. Given >> that the changes in that area seem to have unexpected side effects and we do not >> really know the other patch would not do the same. When reverting the >> work-around we got huge sync times. IIRC the side effects of that patch were not >> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and >> wait for the proper upstream solution? > > Upstream gave up on fixing the problem? Argh. > No not given up but currently needing to come up with a better solution. > Huge sync times in Lucid? When what was changed? > Err, wasn't that the reason for the patch below in the first place? Maybe my description was not completely accurate. It was very long time to unmount. I might be wrong but in my memory that was as bad as 30m in bad cases. For which the reason is that sync is done syncronously. > I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even > if there isn't a good fix for the umount slow-down since it solves the > slow umount only under certain conditions, and makes other more common > umount scenarios slower. > The question would be how much slower. If it is possible to live with the current status a little longer, then I'd rather wait for the correct upstream fix, than to revert this patch to see one regression fixed by another one coming back. Stefan > -Kees >
On Thu, Jun 10, 2010 at 08:57:55AM +0200, Stefan Bader wrote: > Err, wasn't that the reason for the patch below in the first place? Maybe my > description was not completely accurate. It was very long time to unmount. I > might be wrong but in my memory that was as bad as 30m in bad cases. > For which the reason is that sync is done syncronously. Without the forced-sync-on-umount work-around patch: - umount of a single filesystem with barrier support on an idle system can take upwards of 15 minutes. - umount of a single filesystem with barrier support on a busy system can take upwards of 4 hours. - umount of everything else is fast and normal (less than a hundredth of a second to umount a bind mount), regardless of system IO load. With patch: - umount of a single filesystem with barrier support on an idle system will take as long as a normal sync and umount (i.e. fast and normal). - umount of everything else on an idle system takes as long as a sync and umount (raises trivial bind umounts to at least 1 second). - umount of everything else on a busy system takes as long as a sync of the entire system and umount of the filesystem (which can sometimes take upwards of 4 hours due to the original problem of dirty umounting of filesystems with barrier support). The addition of the sync workaround means that busy systems will have umount blocked for as long as _all_ sync activity takes on the given system. I'll leave it up to you, but for me in most cases, the patch made my system significantly worse than without it. -Kees
On Wed, Jun 9, 2010 at 8:06 PM, Kees Cook <kees.cook@canonical.com> wrote: > Hi Stefan, > > On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote: >> Ok, so upstream seems to have officially reverted the changes for now. Given >> that the changes in that area seem to have unexpected side effects and we do not >> really know the other patch would not do the same. When reverting the >> work-around we got huge sync times. IIRC the side effects of that patch were not >> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and >> wait for the proper upstream solution? > > Upstream gave up on fixing the problem? Argh. > > Huge sync times in Lucid? When what was changed? > > I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even > if there isn't a good fix for the umount slow-down since it solves the > slow umount only under certain conditions, and makes other more common > umount scenarios slower. Dave Chinner just committed a ratified fix [1] for this upstream, which may be a good cherry-pick. Dan --- [1] commit d87815cb2090e07b0b0b2d73dc9740706e92c80c Author: Dave Chinner <dchinner@redhat.com> Date: Wed Jun 9 10:37:20 2010 +1000 writeback: limit write_cache_pages integrity scanning to current EOF sync can currently take a really long time if a concurrent writer is extending a file. The problem is that the dirty pages on the address space grow in the same direction as write_cache_pages scans, so if the writer keeps ahead of writeback, the writeback will not terminate until the writer stops adding dirty pages. For a data integrity sync, we only need to write the pages dirty at the time we start the writeback, so we can stop scanning once we get to the page that was at the end of the file at the time the scan started. This will prevent operations like copying a large file preventing sync from completing as it will not write back pages that were dirtied after the sync was started. This does not impact the existing integrity guarantees, as any dirty page (old or new) within the EOF range at the start of the scan will still be captured. This patch will not prevent sync from blocking on large writes into holes. That requires more complex intervention while this patch only addresses the common append-case of this sync holdoff. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 06/12/2010 04:18 PM, Daniel J Blueman wrote: > On Wed, Jun 9, 2010 at 8:06 PM, Kees Cook <kees.cook@canonical.com> wrote: >> Hi Stefan, >> >> On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote: >>> Ok, so upstream seems to have officially reverted the changes for now. Given >>> that the changes in that area seem to have unexpected side effects and we do not >>> really know the other patch would not do the same. When reverting the >>> work-around we got huge sync times. IIRC the side effects of that patch were not >>> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and >>> wait for the proper upstream solution? >> >> Upstream gave up on fixing the problem? Argh. >> >> Huge sync times in Lucid? When what was changed? >> >> I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even >> if there isn't a good fix for the umount slow-down since it solves the >> slow umount only under certain conditions, and makes other more common >> umount scenarios slower. > > Dave Chinner just committed a ratified fix [1] for this upstream, > which may be a good cherry-pick. > > Dan Hi Daniel, this looks like an additional corner case from the description (I have not yet had time to look at the actual patch). The case we are facing is that sync on umount used to get converted into an asynchronous sync, followed by a synchronous one. This changed in a way that the sync on umount is done only synchronously which waits on every single page. But the patch you pointed to definitely is something to keep in mind as well. Cheers, Stefan > > --- [1] > > commit d87815cb2090e07b0b0b2d73dc9740706e92c80c > Author: Dave Chinner <dchinner@redhat.com> > Date: Wed Jun 9 10:37:20 2010 +1000 > > writeback: limit write_cache_pages integrity scanning to current EOF > > sync can currently take a really long time if a concurrent writer is > extending a file. The problem is that the dirty pages on the address > space grow in the same direction as write_cache_pages scans, so if > the writer keeps ahead of writeback, the writeback will not > terminate until the writer stops adding dirty pages. > > For a data integrity sync, we only need to write the pages dirty at > the time we start the writeback, so we can stop scanning once we get > to the page that was at the end of the file at the time the scan > started. > > This will prevent operations like copying a large file preventing > sync from completing as it will not write back pages that were > dirtied after the sync was started. This does not impact the > existing integrity guarantees, as any dirty page (old or new) > within the EOF range at the start of the scan will still be > captured. > > This patch will not prevent sync from blocking on large writes into > holes. That requires more complex intervention while this patch only > addresses the common append-case of this sync holdoff. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patches discussed here were causing regresssions and have been reverted upstream. By now there is a new set (which is even bigger). I am waiting for some feedback on a backport I did for 2.6.32.y from upstream.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 4102f20..937a8ae 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -583,9 +583,9 @@ static int pin_sb_for_writeback(struct writeback_control *wbc, unpin_sb_for_writeback(psb); /* - * Caller must already hold the ref for this + * Caller must already hold the ref for integrity sync, and on umount */ - if (wbc->sync_mode == WB_SYNC_ALL) { + if (wbc->sync_mode == WB_SYNC_ALL || !sb->s_root) { WARN_ON(!rwsem_is_locked(&sb->s_umount)); return 0; } @@ -597,6 +597,7 @@ static int pin_sb_for_writeback(struct writeback_control *wbc, spin_unlock(&sb_lock); goto pinned; } + WARN_ON(1); /* * umounted, drop rwsem again and fall through to failure */