Message ID | 20200125013553.24899-1-willy@infradead.org |
---|---|
Headers | show |
Series | Change readahead API | expand |
On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > This series adds a readahead address_space operation to eventually > replace the readpages operation. The key difference is that > pages are added to the page cache as they are allocated (and > then looked up by the filesystem) instead of passing them on a > list to the readpages operation and having the filesystem add > them to the page cache. It's a net reduction in code for each > implementation, more efficient than walking a list, and solves > the direct-write vs buffered-read problem reported by yu kuai at > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ Unclear which patch fixes this and how it did it?
On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote: > On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@infradead.org> wrote: > > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > > > This series adds a readahead address_space operation to eventually > > replace the readpages operation. The key difference is that > > pages are added to the page cache as they are allocated (and > > then looked up by the filesystem) instead of passing them on a > > list to the readpages operation and having the filesystem add > > them to the page cache. It's a net reduction in code for each > > implementation, more efficient than walking a list, and solves > > the direct-write vs buffered-read problem reported by yu kuai at > > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ > > Unclear which patch fixes this and how it did it? I suppose the problem isn't fixed until patch 13/13 is applied. What yu kuai is seeing is a race where readahead allocates a page, then passes it to iomap_readpages, which calls xfs_read_iomap_begin() which looks up the extent. Then thread 2 does DIO which modifies the extent, because there's nothing to say that thread 1 is still using it. With this patch series, the readpages code puts the locked pages in the cache before calling iomap_readpages, so any racing write will block on the locked page until readahead is completed. If you're tempted to put this into -mm, I have a couple of new changes; one to fix a kernel-doc warning for mpage_readahead() and one to add kernel-doc for iomap_readahead(): +++ b/fs/mpage.c @@ -339,9 +339,7 @@ /** * mpage_readahead - start reads against pages - * @mapping: the address_space - * @start: The number of the first page to read. - * @nr_pages: The number of consecutive pages to read. + * @rac: Describes which pages to read. * @get_block: The filesystem's block mapper function. * * This function walks the pages and the blocks within each page, building and +++ b/fs/iomap/buffered-io.c @@ -395,6 +395,21 @@ return done; } +/** + * iomap_readahead - Attempt to read pages from a file. + * @rac: Describes the pages to be read. + * @ops: The operations vector for the filesystem. + * + * This function is for filesystems to call to implement their readahead + * address_space operation. + * + * Context: The file is pinned by the caller, and the pages to be read are + * all locked and have an elevated refcount. This function will unlock + * the pages (once I/O has completed on them, or I/O has been determined to + * not be necessary). It will also decrease the refcount once the pages + * have been submitted for I/O. After this point, the page may be removed + * from the page cache, and should not be referenced. + */ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops) { struct inode *inode = rac->mapping->host; I'll do a v6 with those changes soon, but I would really like a bit more review from filesystem people, particularly ocfs2 and gfs2.
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> This series adds a readahead address_space operation to eventually replace the readpages operation. The key difference is that pages are added to the page cache as they are allocated (and then looked up by the filesystem) instead of passing them on a list to the readpages operation and having the filesystem add them to the page cache. It's a net reduction in code for each implementation, more efficient than walking a list, and solves the direct-write vs buffered-read problem reported by yu kuai at https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ Matthew Wilcox (Oracle) (12): mm: Fix the return type of __do_page_cache_readahead readahead: Ignore return value of ->readpages readahead: Put pages in cache earlier mm: Add readahead address space operation fs: Convert mpage_readpages to mpage_readahead btrfs: Convert from readpages to readahead erofs: Convert uncompressed files from readpages to readahead erofs: Convert compressed files from readpages to readahead ext4: Convert from readpages to readahead f2fs: Convert from readpages to readahead fuse: Convert from readpages to readahead iomap: Convert from readpages to readahead Documentation/filesystems/locking.rst | 7 ++- Documentation/filesystems/vfs.rst | 11 ++++ drivers/staging/exfat/exfat_super.c | 9 ++-- fs/block_dev.c | 9 ++-- fs/btrfs/extent_io.c | 15 ++---- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 18 +++---- fs/erofs/data.c | 34 +++++------- fs/erofs/zdata.c | 21 +++----- fs/ext2/inode.c | 12 ++--- fs/ext4/ext4.h | 2 +- fs/ext4/inode.c | 24 ++++----- fs/ext4/readpage.c | 20 +++---- fs/f2fs/data.c | 33 +++++------- fs/fat/inode.c | 8 +-- fs/fuse/file.c | 35 ++++++------ fs/gfs2/aops.c | 20 ++++--- fs/hpfs/file.c | 8 +-- fs/iomap/buffered-io.c | 74 ++++++-------------------- fs/iomap/trace.h | 2 +- fs/isofs/inode.c | 9 ++-- fs/jfs/inode.c | 8 +-- fs/mpage.c | 38 +++++--------- fs/nilfs2/inode.c | 13 ++--- fs/ocfs2/aops.c | 32 +++++------ fs/omfs/file.c | 8 +-- fs/qnx6/inode.c | 8 +-- fs/reiserfs/inode.c | 10 ++-- fs/udf/inode.c | 8 +-- fs/xfs/xfs_aops.c | 10 ++-- include/linux/fs.h | 2 + include/linux/iomap.h | 2 +- include/linux/mpage.h | 2 +- include/linux/pagemap.h | 12 +++++ include/trace/events/erofs.h | 6 +-- include/trace/events/f2fs.h | 6 +-- mm/internal.h | 2 +- mm/migrate.c | 2 +- mm/readahead.c | 76 +++++++++++++++++---------- 39 files changed, 289 insertions(+), 329 deletions(-)