mbox series

[RFC,00/24] vfs, nfsd, nfs: implement directory delegations

Message ID 20240315-dir-deleg-v1-0-a1d6209a3654@kernel.org
Headers show
Series vfs, nfsd, nfs: implement directory delegations | expand

Message

Jeff Layton March 15, 2024, 4:52 p.m. UTC
NFSv4.1 adds a new GET_DIR_DELEGATION operation, to allow clients
to request a delegation on a directory. If the client holds a directory
delegation, then it knows that nothing will change the dentries in it
until it has been recalled.

Last year, Rick Macklem gave a talk at the NFS Bakeathon on his
implementation of directory delegations for FreeBSD [1], and showed that
it can greatly improve LOOKUP-heavy workloads. There is also some
earlier work by CITI [2] that showed similar results. The SMB protocol
also has a similar sort of construct, and they have also seen great
performance improvements from it.

This patchset adds minimal support for directory delegations to the VFS
layer, and and plumbs support for GET_DIR_DELEGATION handling into nfsd.
It also has a set of patches for the NFS client. The VFS and server-side
bits all seem to work now and do the right thing. The client side is
still very much a WIP since it relies more on policy and heuristics.

What I've found is that without directory delegations in play, the
client usually revalidates dentries by issuing a GETATTR for the parent.
If it holds a delegation on the parent, it can avoid that GETATTR, which
is a nice benefit.

But...the catch is that the Linux NFS client defaults to caching
dentries for 30-60s before revalidating them. The attrcache timeout
quickly ends up as 60s on directories that aren't changing, so for a
delegation to be useful, you need to be revalidating dentries in the
same directory over a rather long span of time. Even then, at best it'll
be optimizing away one GETATTR per inode every 60s or so.

I've done a little ad-hoc testing with kernel builds, but they don't
show much benefit from this. Testing with them both enabled and
disabled, the kernel builds finished within 5 seconds of one another on
a 12.5 minute build, which is insignificant.

The GETATTR load seems to be reduced by about 7% with dir delegs
enabled, which is nice, but GETATTRs are usually pretty lightweight
anyway, and we do have the overhead of dir delegations to deal with.

There is a lot of room for improvement that could make this more
compelling:

1) The client's logic for fetching a delegation is probably too
   primitive. Maybe we need to request one only after every 3-5
   revalidations?

2) The client uses the same "REFERENCED" timeout for dir delegations
   as it does for clients. It could probably benefit from holding them
   longer by default. I'm not sure how best to do that.

3) The protocol allows for clients and server to use asynchronous
   notifications to avoid issuing delegation breaks when there are
   changes to a delegated directory. That's not implemented yet,
   but it could change the calculus here for workloads where multiple
   clients are accessing and changing child dentries.

4) A GET_DIR_DELEGATION+READDIR compound could be worthwhile. If we have
   all of the dentries in the directory, then we could (in principle)
   satisfy any negtive dentry lookup w/o going to the server. If they're
   all in the correct order on d_subdirs, we could satisfy a readdir
   via dcache_readdir without going to the server.

5) The server could probably avoid handing them out if the inode changed
   <60s in the past?

For the VFS and NFSD bits, I mainly want to get some early feedback.
Does this seem like a reasonable approach? Anyone see major problems
with handling directory delegations in the VFS codepaths?

The NFS client bits are quite a bit more rough.  Is there a better
heuristic for when to request a dir delegation? I could use a bit of
guidance here.

[1]: https://www.youtube.com/watch?v=DdFyH3BN5pI
[2]: https://linux-nfs.org/wiki/index.php/CITI_Experience_with_Directory_Delegations

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Jeff Layton (24):
      filelock: push the S_ISREG check down to ->setlease handlers
      filelock: add a lm_set_conflict lease_manager callback
      vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
      vfs: allow mkdir to wait for delegation break on parent
      vfs: allow rmdir to wait for delegation break on parent
      vfs: break parent dir delegations in open(..., O_CREAT) codepath
      vfs: make vfs_create break delegations on parent directory
      vfs: make vfs_mknod break delegations on parent directory
      filelock: lift the ban on directory leases in generic_setlease
      nfsd: allow filecache to hold S_IFDIR files
      nfsd: allow DELEGRETURN on directories
      nfsd: encoders and decoders for GET_DIR_DELEGATION
      nfsd: check for delegation conflicts vs. the same client
      nfsd: wire up GET_DIR_DELEGATION handling
      nfs: fix nfs_stateid_hash prototype when CONFIG_CRC32 isn't set
      nfs: remove unused NFS_CALL macro
      nfs: add cache_validity to the nfs_inode_event tracepoints
      nfs: add a tracepoint to nfs_inode_detach_delegation_locked
      nfs: new tracepoint in nfs_delegation_need_return
      nfs: new tracepoint in match_stateid operation
      nfs: add a GDD_GETATTR rpc operation
      nfs: skip dentry revalidation when parent dir has a delegation
      nfs: optionally request a delegation on GETATTR
      nfs: add a module parameter to disable directory delegations

 drivers/base/devtmpfs.c   |   6 +-
 fs/cachefiles/namei.c     |   2 +-
 fs/ecryptfs/inode.c       |   8 +--
 fs/init.c                 |   4 +-
 fs/locks.c                |  12 +++-
 fs/namei.c                | 101 ++++++++++++++++++++++++++++------
 fs/nfs/delegation.c       |   5 ++
 fs/nfs/dir.c              |  20 +++++++
 fs/nfs/internal.h         |   2 +-
 fs/nfs/nfs4file.c         |   2 +
 fs/nfs/nfs4proc.c         |  55 ++++++++++++++++++-
 fs/nfs/nfs4trace.h        | 104 +++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c          | 136 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfstrace.h         |   8 ++-
 fs/nfsd/filecache.c       |  37 +++++++++++--
 fs/nfsd/filecache.h       |   2 +
 fs/nfsd/nfs3proc.c        |   2 +-
 fs/nfsd/nfs4proc.c        |  49 +++++++++++++++++
 fs/nfsd/nfs4recover.c     |   6 +-
 fs/nfsd/nfs4state.c       | 112 +++++++++++++++++++++++++++++++++++++-
 fs/nfsd/nfs4xdr.c         |  72 +++++++++++++++++++++++-
 fs/nfsd/state.h           |   5 ++
 fs/nfsd/vfs.c             |  13 +++--
 fs/nfsd/vfs.h             |   2 +-
 fs/nfsd/xdr4.h            |   8 +++
 fs/open.c                 |   2 +-
 fs/overlayfs/overlayfs.h  |   8 +--
 fs/smb/client/cifsfs.c    |   3 +
 fs/smb/server/vfs.c       |   8 +--
 include/linux/filelock.h  |  10 ++++
 include/linux/fs.h        |  11 ++--
 include/linux/nfs4.h      |   6 ++
 include/linux/nfs_fs.h    |   1 +
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |   9 +--
 net/unix/af_unix.c        |   2 +-
 36 files changed, 754 insertions(+), 80 deletions(-)
---
base-commit: b80a250a20975e30ff8635dd75d6d20bf05405a1
change-id: 20240215-dir-deleg-e212210ba9d4

Best regards,