Message ID | 20240328163424.2781320-1-dhowells@redhat.com |
---|---|
Headers | show |
Series | netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache | expand |
On Thu, 28 Mar 2024 16:33:52 +0000, David Howells wrote: > The primary purpose of these patches is to rework the netfslib writeback > implementation such that pages read from the cache are written to the cache > through ->writepages(), thereby allowing the fscache page flag to be > retired. > > The reworking also: > > [...] Pulled from netfs-writeback which contains the minor fixes pointed out. --- Applied to the vfs.netfs branch of the vfs/vfs.git tree. Patches in the vfs.netfs branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.netfs
On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote: > Hi Christian, Willy, > > The primary purpose of these patches is to rework the netfslib writeback > implementation such that pages read from the cache are written to the cache > through ->writepages(), thereby allowing the fscache page flag to be > retired. > > The reworking also: > > (1) builds on top of the new writeback_iter() infrastructure; > > (2) makes it possible to use vectored write RPCs as discontiguous streams > of pages can be accommodated; > > (3) makes it easier to do simultaneous content crypto and stream division. > > (4) provides support for retrying writes and re-dividing a stream; > > (5) replaces the ->launder_folio() op, so that ->writepages() is used > instead; > > (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest > structs to avoid allocation failure in the writeback path. > > Some code that uses the fscache page flag is retained for compatibility > purposes with nfs and ceph. The code is switched to using the synonymous > private_2 label instead and marked with deprecation comments. I have a > separate set of patches that convert cifs to use this code. > > -~- > > In this new implementation, writeback_iter() is used to pump folios, > progressively creating two parallel, but separate streams. Either or both > streams can contain gaps, and the subrequests in each stream can be of > variable size, don't need to align with each other and don't need to align > with the folios. (Note that more streams can be added if we have multiple > servers to duplicate data to). > > Indeed, subrequests can cross folio boundaries, may cover several folios or > a folio may be spanned by multiple subrequests, e.g.: > > +---+---+-----+-----+---+----------+ > Folios: | | | | | | | > +---+---+-----+-----+---+----------+ > > +------+------+ +----+----+ > Upload: | | |.....| | | > +------+------+ +----+----+ > > +------+------+------+------+------+ > Cache: | | | | | | > +------+------+------+------+------+ > > Data that got read from the server that needs copying to the cache is > stored in folios that are marked dirty and have folio->private set to a > special value. > > The progressive subrequest construction permits the algorithm to be > preparing both the next upload to the server and the next write to the > cache whilst the previous ones are already in progress. Throttling can be > applied to control the rate of production of subrequests - and, in any > case, we probably want to write them to the server in ascending order, > particularly if the file will be extended. > > Content crypto can also be prepared at the same time as the subrequests and > run asynchronously, with the prepped requests being stalled until the > crypto catches up with them. This might also be useful for transport > crypto, but that happens at a lower layer, so probably would be harder to > pull off. > > The algorithm is split into three parts: > > (1) The issuer. This walks through the data, packaging it up, encrypting > it and creating subrequests. The part of this that generates > subrequests only deals with file positions and spans and so is usable > for DIO/unbuffered writes as well as buffered writes. > > (2) The collector. This asynchronously collects completed subrequests, > unlocks folios, frees crypto buffers and performs any retries. This > runs in a work queue so that the issuer can return to the caller for > writeback (so that the VM can have its kswapd thread back) or async > writes. > > Collection is slightly complex as the collector has to work out where > discontiguities happen in the folio list so that it doesn't try and > collect folios that weren't included in the write out. > > (3) The retryer. This pauses the issuer, waits for all outstanding > subrequests to complete and then goes through the failed subrequests > to reissue them. This may involve reprepping them (with cifs, the > credits must be renegotiated and a subrequest may need splitting), and > doing RMW for content crypto if there's a conflicting change on the > server. > > David > > David Howells (26): > cifs: Fix duplicate fscache cookie warnings > 9p: Clean up some kdoc and unused var warnings. > netfs: Update i_blocks when write committed to pagecache > netfs: Replace PG_fscache by setting folio->private and marking dirty > mm: Remove the PG_fscache alias for PG_private_2 > netfs: Remove deprecated use of PG_private_2 as a second writeback > flag > netfs: Make netfs_io_request::subreq_counter an atomic_t > netfs: Use subreq_counter to allocate subreq debug_index values > mm: Provide a means of invalidation without using launder_folio > cifs: Use alternative invalidation to using launder_folio > 9p: Use alternative invalidation to using launder_folio > afs: Use alternative invalidation to using launder_folio > netfs: Remove ->launder_folio() support > netfs: Use mempools for allocating requests and subrequests > mm: Export writeback_iter() > netfs: Switch to using unsigned long long rather than loff_t > netfs: Fix writethrough-mode error handling > netfs: Add some write-side stats and clean up some stat names > netfs: New writeback implementation > netfs, afs: Implement helpers for new write code > netfs, 9p: Implement helpers for new write code > netfs, cachefiles: Implement helpers for new write code > netfs: Cut over to using new writeback code > netfs: Remove the old writeback code > netfs: Miscellaneous tidy ups > netfs, afs: Use writeback retry to deal with alternate keys > > fs/9p/vfs_addr.c | 60 +-- > fs/9p/vfs_inode_dotl.c | 4 - > fs/afs/file.c | 8 +- > fs/afs/internal.h | 6 +- > fs/afs/validation.c | 4 +- > fs/afs/write.c | 187 ++++---- > fs/cachefiles/io.c | 75 +++- > fs/ceph/addr.c | 24 +- > fs/ceph/inode.c | 2 + > fs/netfs/Makefile | 3 +- > fs/netfs/buffered_read.c | 40 +- > fs/netfs/buffered_write.c | 832 ++++------------------------------- > fs/netfs/direct_write.c | 30 +- > fs/netfs/fscache_io.c | 14 +- > fs/netfs/internal.h | 55 ++- > fs/netfs/io.c | 155 +------ > fs/netfs/main.c | 55 ++- > fs/netfs/misc.c | 10 +- > fs/netfs/objects.c | 81 +++- > fs/netfs/output.c | 478 -------------------- > fs/netfs/stats.c | 17 +- > fs/netfs/write_collect.c | 813 ++++++++++++++++++++++++++++++++++ > fs/netfs/write_issue.c | 673 ++++++++++++++++++++++++++++ > fs/nfs/file.c | 8 +- > fs/nfs/fscache.h | 6 +- > fs/nfs/write.c | 4 +- > fs/smb/client/cifsfs.h | 1 - > fs/smb/client/file.c | 136 +----- > fs/smb/client/fscache.c | 16 +- > fs/smb/client/inode.c | 27 +- > include/linux/fscache.h | 22 +- > include/linux/netfs.h | 196 +++++---- > include/linux/pagemap.h | 1 + > include/net/9p/client.h | 2 + > include/trace/events/netfs.h | 249 ++++++++++- > mm/filemap.c | 52 ++- > mm/page-writeback.c | 1 + > net/9p/Kconfig | 1 + > net/9p/client.c | 49 +++ > net/9p/trans_fd.c | 1 - > 40 files changed, 2492 insertions(+), 1906 deletions(-) > delete mode 100644 fs/netfs/output.c > create mode 100644 fs/netfs/write_collect.c > create mode 100644 fs/netfs/write_issue.c > This all looks pretty reasonable. There is at least one bugfix that looks like it ought to go in independently (#17). #19 is huge, complex and hard to review. That will need some cycles in -next, I think. In any case, on any that I didn't send comments you can add: Reviewed-by: Jeff Layton <jlayton@kernel.org>