mbox series

[v7,00/19] gfs2: Fix mmap + page fault deadlocks

Message ID 20210827164926.1726765-1-agruenba@redhat.com
Headers show
Series gfs2: Fix mmap + page fault deadlocks | expand

Message

Andreas Gruenbacher Aug. 27, 2021, 4:49 p.m. UTC
Hi all,

here's another update on top of v5.14-rc7.  Changes:

 * Some of the patch descriptions have been improved.

 * Patch "gfs2: Eliminate ip->i_gh" has been moved further to the front.

At this point, I'm not aware of anything that still needs fixing, 


The first two patches are independent of the core of this patch queue
and I've asked the respective maintainers to have a look, but I've not
heard back from them.  The first patch should just go into Al's tree;
it's a relatively straight-forward fix.  The second patch really needs
to be looked at; it might break things:

  iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
  powerpc/kvm: Fix kvm_use_magic_page


Al and Linus seem to have a disagreement about the error reporting
semantics that functions fault_in_{readable,writeable} and
fault_in_iov_iter_{readable,writeable} should have.  I've implemented
Linus's suggestion of returning the number of bytes not faulted in and I
think that being able to tell if "nothing", "something" or "everything"
could be faulted in does help, but I'll live with anything that allows
us to make progress.


The iomap changes should ideally be reviewed by Christoph; I've not
heard from him about those.


Thanks,
Andreas

Andreas Gruenbacher (16):
  iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
  powerpc/kvm: Fix kvm_use_magic_page
  gup: Turn fault_in_pages_{readable,writeable} into
    fault_in_{readable,writeable}
  iov_iter: Turn iov_iter_fault_in_readable into
    fault_in_iov_iter_readable
  iov_iter: Introduce fault_in_iov_iter_writeable
  gfs2: Add wrapper for iomap_file_buffered_write
  gfs2: Clean up function may_grant
  gfs2: Move the inode glock locking to gfs2_file_buffered_write
  gfs2: Eliminate ip->i_gh
  gfs2: Fix mmap + page fault deadlocks for buffered I/O
  iomap: Fix iomap_dio_rw return value for user copies
  iomap: Support partial direct I/O on user copy failures
  iomap: Add done_before argument to iomap_dio_rw
  gup: Introduce FOLL_NOFAULT flag to disable page faults
  iov_iter: Introduce nofault flag to disable page faults
  gfs2: Fix mmap + page fault deadlocks for direct I/O

Bob Peterson (3):
  gfs2: Eliminate vestigial HIF_FIRST
  gfs2: Remove redundant check from gfs2_glock_dq
  gfs2: Introduce flag for glock holder auto-demotion

 arch/powerpc/kernel/kvm.c           |   3 +-
 arch/powerpc/kernel/signal_32.c     |   4 +-
 arch/powerpc/kernel/signal_64.c     |   2 +-
 arch/x86/kernel/fpu/signal.c        |   7 +-
 drivers/gpu/drm/armada/armada_gem.c |   7 +-
 fs/btrfs/file.c                     |   7 +-
 fs/btrfs/ioctl.c                    |   5 +-
 fs/ext4/file.c                      |   5 +-
 fs/f2fs/file.c                      |   2 +-
 fs/fuse/file.c                      |   2 +-
 fs/gfs2/bmap.c                      |  60 +----
 fs/gfs2/file.c                      | 245 ++++++++++++++++++--
 fs/gfs2/glock.c                     | 340 +++++++++++++++++++++-------
 fs/gfs2/glock.h                     |  20 ++
 fs/gfs2/incore.h                    |   5 +-
 fs/iomap/buffered-io.c              |   2 +-
 fs/iomap/direct-io.c                |  21 +-
 fs/ntfs/file.c                      |   2 +-
 fs/xfs/xfs_file.c                   |   6 +-
 fs/zonefs/super.c                   |   4 +-
 include/linux/iomap.h               |  11 +-
 include/linux/mm.h                  |   3 +-
 include/linux/pagemap.h             |  58 +----
 include/linux/uio.h                 |   4 +-
 lib/iov_iter.c                      | 103 +++++++--
 mm/filemap.c                        |   4 +-
 mm/gup.c                            | 139 +++++++++++-
 27 files changed, 785 insertions(+), 286 deletions(-)

Comments

Linus Torvalds Aug. 27, 2021, 5:16 p.m. UTC | #1
On Fri, Aug 27, 2021 at 9:49 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> here's another update on top of v5.14-rc7.  Changes:
>
>  * Some of the patch descriptions have been improved.
>
>  * Patch "gfs2: Eliminate ip->i_gh" has been moved further to the front.
>
> At this point, I'm not aware of anything that still needs fixing,

From a quick scan, I didn't see anything that raised my hackles.

But I skipped all the gfs2-specific changes in the series, since
that's all above my paygrade.

                 Linus
Andreas Gruenbacher Sept. 1, 2021, 7:52 p.m. UTC | #2
On Fri, Aug 27, 2021 at 7:17 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Aug 27, 2021 at 9:49 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >
> > here's another update on top of v5.14-rc7.  Changes:
> >
> >  * Some of the patch descriptions have been improved.
> >
> >  * Patch "gfs2: Eliminate ip->i_gh" has been moved further to the front.
> >
> > At this point, I'm not aware of anything that still needs fixing,
>
> From a quick scan, I didn't see anything that raised my hackles.

So there's a minor merge conflict between Christoph's iomap_iter
conversion and this patch queue now, and I should probably clarify the
description of "iomap: Add done_before argument to iomap_dio_rw" that
Darrick ran into. Then there are the user copy issues that Al has
pointed out. Fixing those will create superficial conflicts with this
patch queue, but probably nothing serious.

So how should I proceed: do you expect a v8 of this patch queue on top
of the current mainline?

Thanks,
Andreas
Filipe Manana Sept. 3, 2021, 3:07 p.m. UTC | #3
On Fri, Aug 27, 2021 at 5:51 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> Hi all,
>
> here's another update on top of v5.14-rc7.  Changes:
>
>  * Some of the patch descriptions have been improved.
>
>  * Patch "gfs2: Eliminate ip->i_gh" has been moved further to the front.
>
> At this point, I'm not aware of anything that still needs fixing,

Hi, thanks for doing this.

In btrfs we also have a deadlock (after the conversion to use iomap
for direct IO) triggered by your recent test case for fstests,
generic/647 [1].
Even though we can fix it in btrfs without touching iomap, iov_iter,
etc, it would be too complex for such a rare and exotic case (a user
passing a buffer for a direct IO read/write that is memory mapped to
the same file range of the operation is very uncommon at least). But
this patchset would make the fix much simpler and cleaner.

One thing I noticed is that, for direct IO reads, despite setting the
->nofault attribute of the iov_iter to true, we can still get page
faults while in the iomap code.
This happens when reading from holes and unwritten/prealloc extents,
because iomap calls iov_iter_zero() and this seems to ignore the value
of ->nofault.
Is that intentional? I can get around it by surrounding the iomap call
with pagefault_disable() / pagefault_enable(), but it seems odd to do
so, given that iov_iter->nofault was set to true.

[1] https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=d3cbdabffc4cb28850e97bc7bd8a7a1460db94e5

Thanks.

>
>
> The first two patches are independent of the core of this patch queue
> and I've asked the respective maintainers to have a look, but I've not
> heard back from them.  The first patch should just go into Al's tree;
> it's a relatively straight-forward fix.  The second patch really needs
> to be looked at; it might break things:
>
>   iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
>   powerpc/kvm: Fix kvm_use_magic_page
>
>
> Al and Linus seem to have a disagreement about the error reporting
> semantics that functions fault_in_{readable,writeable} and
> fault_in_iov_iter_{readable,writeable} should have.  I've implemented
> Linus's suggestion of returning the number of bytes not faulted in and I
> think that being able to tell if "nothing", "something" or "everything"
> could be faulted in does help, but I'll live with anything that allows
> us to make progress.
>
>
> The iomap changes should ideally be reviewed by Christoph; I've not
> heard from him about those.
>
>
> Thanks,
> Andreas
>
> Andreas Gruenbacher (16):
>   iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
>   powerpc/kvm: Fix kvm_use_magic_page
>   gup: Turn fault_in_pages_{readable,writeable} into
>     fault_in_{readable,writeable}
>   iov_iter: Turn iov_iter_fault_in_readable into
>     fault_in_iov_iter_readable
>   iov_iter: Introduce fault_in_iov_iter_writeable
>   gfs2: Add wrapper for iomap_file_buffered_write
>   gfs2: Clean up function may_grant
>   gfs2: Move the inode glock locking to gfs2_file_buffered_write
>   gfs2: Eliminate ip->i_gh
>   gfs2: Fix mmap + page fault deadlocks for buffered I/O
>   iomap: Fix iomap_dio_rw return value for user copies
>   iomap: Support partial direct I/O on user copy failures
>   iomap: Add done_before argument to iomap_dio_rw
>   gup: Introduce FOLL_NOFAULT flag to disable page faults
>   iov_iter: Introduce nofault flag to disable page faults
>   gfs2: Fix mmap + page fault deadlocks for direct I/O
>
> Bob Peterson (3):
>   gfs2: Eliminate vestigial HIF_FIRST
>   gfs2: Remove redundant check from gfs2_glock_dq
>   gfs2: Introduce flag for glock holder auto-demotion
>
>  arch/powerpc/kernel/kvm.c           |   3 +-
>  arch/powerpc/kernel/signal_32.c     |   4 +-
>  arch/powerpc/kernel/signal_64.c     |   2 +-
>  arch/x86/kernel/fpu/signal.c        |   7 +-
>  drivers/gpu/drm/armada/armada_gem.c |   7 +-
>  fs/btrfs/file.c                     |   7 +-
>  fs/btrfs/ioctl.c                    |   5 +-
>  fs/ext4/file.c                      |   5 +-
>  fs/f2fs/file.c                      |   2 +-
>  fs/fuse/file.c                      |   2 +-
>  fs/gfs2/bmap.c                      |  60 +----
>  fs/gfs2/file.c                      | 245 ++++++++++++++++++--
>  fs/gfs2/glock.c                     | 340 +++++++++++++++++++++-------
>  fs/gfs2/glock.h                     |  20 ++
>  fs/gfs2/incore.h                    |   5 +-
>  fs/iomap/buffered-io.c              |   2 +-
>  fs/iomap/direct-io.c                |  21 +-
>  fs/ntfs/file.c                      |   2 +-
>  fs/xfs/xfs_file.c                   |   6 +-
>  fs/zonefs/super.c                   |   4 +-
>  include/linux/iomap.h               |  11 +-
>  include/linux/mm.h                  |   3 +-
>  include/linux/pagemap.h             |  58 +----
>  include/linux/uio.h                 |   4 +-
>  lib/iov_iter.c                      | 103 +++++++--
>  mm/filemap.c                        |   4 +-
>  mm/gup.c                            | 139 +++++++++++-
>  27 files changed, 785 insertions(+), 286 deletions(-)
>
> --
> 2.26.3
>
Linus Torvalds Sept. 3, 2021, 3:52 p.m. UTC | #4
On Wed, Sep 1, 2021 at 12:53 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> So there's a minor merge conflict between Christoph's iomap_iter
> conversion and this patch queue now, and I should probably clarify the
> description of "iomap: Add done_before argument to iomap_dio_rw" that
> Darrick ran into. Then there are the user copy issues that Al has
> pointed out. Fixing those will create superficial conflicts with this
> patch queue, but probably nothing serious.
>
> So how should I proceed: do you expect a v8 of this patch queue on top
> of the current mainline?

So if you rebase for fixes, it's going to be a "next merge window" thing again.

Personally, I'm ok with the series as is, and the conflict isn't an
issue. So I'd take it as is, and then people can fix up niggling
issues later.

But if somebody screams loudly..

             Linus
Al Viro Sept. 3, 2021, 6:25 p.m. UTC | #5
On Fri, Sep 03, 2021 at 08:52:00AM -0700, Linus Torvalds wrote:
> On Wed, Sep 1, 2021 at 12:53 PM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> >
> > So there's a minor merge conflict between Christoph's iomap_iter
> > conversion and this patch queue now, and I should probably clarify the
> > description of "iomap: Add done_before argument to iomap_dio_rw" that
> > Darrick ran into. Then there are the user copy issues that Al has
> > pointed out. Fixing those will create superficial conflicts with this
> > patch queue, but probably nothing serious.
> >
> > So how should I proceed: do you expect a v8 of this patch queue on top
> > of the current mainline?
> 
> So if you rebase for fixes, it's going to be a "next merge window" thing again.
> 
> Personally, I'm ok with the series as is, and the conflict isn't an
> issue. So I'd take it as is, and then people can fix up niggling
> issues later.
> 
> But if somebody screams loudly..

FWIW, my objections regarding the calling conventions are still there.

Out of 3 callers that do want more than "success/failure", one is gone
(series by tglx) and one more is broken (regardless of the semantics,
btrfs on arm64).  Which leaves 1 caller (fault-in for read in FPU
handling on x86 sigreturn).  That caller turns out to be correct, but
IMO there are fairly strong arguments in favour of *not* using the
normal fault-in in that case.

	"how many bytes can we fault in" is misleading, unless we really
poke into every cacheline in the affected area.  Which might be a primitive
worth having, but it's a lot heavier than needed by the majority of callers.
Linus Torvalds Sept. 3, 2021, 6:47 p.m. UTC | #6
On Fri, Sep 3, 2021 at 11:28 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, my objections regarding the calling conventions are still there.

So I'm happy to further change the calling conventions, but by now
_that_ part is most definitely a "not this merge window". The need for
that ternary state is still there.

It might go away in the future, but I think that's literally that: a
future cleanup. Not really related to the problem at hand.

              Linus