Message ID | 20200125013553.24899-5-willy@infradead.org |
---|---|
State | Superseded |
Headers | show |
Series | Change readahead API | expand |
On 1/24/20 5:35 PM, Matthew Wilcox wrote: > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst > index 7d4d09dd5e6d..bb06fb7b120b 100644 > --- a/Documentation/filesystems/vfs.rst > +++ b/Documentation/filesystems/vfs.rst > @@ -706,6 +706,8 @@ cache in your filesystem. The following members are defined: > int (*readpage)(struct file *, struct page *); > int (*writepages)(struct address_space *, struct writeback_control *); > int (*set_page_dirty)(struct page *page); > + unsigned (*readahead)(struct file *filp, struct address_space *mapping, > + pgoff_t start, unsigned nr_pages); > int (*readpages)(struct file *filp, struct address_space *mapping, > struct list_head *pages, unsigned nr_pages); > int (*write_begin)(struct file *, struct address_space *mapping, > @@ -781,6 +783,15 @@ cache in your filesystem. The following members are defined: > If defined, it should set the PageDirty flag, and the > PAGECACHE_TAG_DIRTY tag in the radix tree. > > +``readahead`` > + called by the VM to read pages associated with the address_space > + object. The pages are consecutive in the page cache and are > + locked. The implementation should decrement the page refcount after > + attempting I/O on each page. Usually the page will be unlocked by > + the I/O completion handler. If the function does not attempt I/O on > + some pages, return the number of pages which were not read so the > + common code can unlock the pages for you. > + Please use consistent indentation (tabs). > ``readpages`` > called by the VM to read pages associated with the address_space > object. This is essentially just a vector version of readpage. cheers.
On Fri, Jan 24, 2020 at 05:35:45PM -0800, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > This replaces ->readpages with a saner interface: > - Return the number of pages not read instead of an ignored error code. > - Pages are already in the page cache when ->readahead is called. > - Implementation looks up the pages in the page cache instead of > having them passed in a linked list. .... > diff --git a/mm/readahead.c b/mm/readahead.c > index 5a6676640f20..6d65dae6dad0 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -121,7 +121,18 @@ static void read_pages(struct address_space *mapping, struct file *filp, > > blk_start_plug(&plug); > > - if (mapping->a_ops->readpages) { > + if (mapping->a_ops->readahead) { > + unsigned left = mapping->a_ops->readahead(filp, mapping, > + start, nr_pages); > + > + while (left) { > + struct page *page = readahead_page(mapping, > + start + nr_pages - left - 1); Off by one? start = 2, nr_pages = 2, left = 1, this looks up the page at index 2, which is the one we issued IO on, not the one we "left behind" which is at index 3. Cheers, Dave.
On Wed, Jan 29, 2020 at 11:24:56AM +1100, Dave Chinner wrote: > On Fri, Jan 24, 2020 at 05:35:45PM -0800, Matthew Wilcox wrote: > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > > > This replaces ->readpages with a saner interface: > > - Return the number of pages not read instead of an ignored error code. > > - Pages are already in the page cache when ->readahead is called. > > - Implementation looks up the pages in the page cache instead of > > having them passed in a linked list. > .... > > diff --git a/mm/readahead.c b/mm/readahead.c > > index 5a6676640f20..6d65dae6dad0 100644 > > --- a/mm/readahead.c > > +++ b/mm/readahead.c > > @@ -121,7 +121,18 @@ static void read_pages(struct address_space *mapping, struct file *filp, > > > > blk_start_plug(&plug); > > > > - if (mapping->a_ops->readpages) { > > + if (mapping->a_ops->readahead) { > > + unsigned left = mapping->a_ops->readahead(filp, mapping, > > + start, nr_pages); > > + > > + while (left) { > > + struct page *page = readahead_page(mapping, > > + start + nr_pages - left - 1); > > Off by one? start = 2, nr_pages = 2, left = 1, this looks up the > page at index 2, which is the one we issued IO on, not the one we > "left behind" which is at index 3. Yup. I originally had: while (left--) ... decided that was too confusing and didn't quite complete that thought.
On Fri, Jan 24, 2020 at 07:57:40PM -0800, Randy Dunlap wrote: > > +``readahead`` > > + called by the VM to read pages associated with the address_space > > + object. The pages are consecutive in the page cache and are > > + locked. The implementation should decrement the page refcount after > > + attempting I/O on each page. Usually the page will be unlocked by > > + the I/O completion handler. If the function does not attempt I/O on > > + some pages, return the number of pages which were not read so the > > + common code can unlock the pages for you. > > + > > Please use consistent indentation (tabs). This turned out to be not my fault. The vim rst ... mode? plugin? Whatever it is, it's converting tabs to spaces! To fix it, I had to rename the file to .txt, make the edits, then rename it back. This is very poor behaviour.
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 5057e4d9dcd1..d8a5dde914b5 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -239,6 +239,8 @@ prototypes:: int (*readpage)(struct file *, struct page *); int (*writepages)(struct address_space *, struct writeback_control *); int (*set_page_dirty)(struct page *page); + unsigned (*readahead)(struct file *, struct address_space *, + pgoff_t start, unsigned nr_pages); int (*readpages)(struct file *filp, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); int (*write_begin)(struct file *, struct address_space *mapping, @@ -271,7 +273,8 @@ writepage: yes, unlocks (see below) readpage: yes, unlocks writepages: set_page_dirty no -readpages: +readahead: yes, unlocks +readpages: no write_begin: locks the page exclusive write_end: yes, unlocks exclusive bmap: @@ -295,6 +298,8 @@ the request handler (/dev/loop). ->readpage() unlocks the page, either synchronously or via I/O completion. +->readahead() unlocks the page like ->readpage(). + ->readpages() populates the pagecache with the passed pages and starts I/O against them. They come unlocked upon I/O completion. diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7d4d09dd5e6d..bb06fb7b120b 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -706,6 +706,8 @@ cache in your filesystem. The following members are defined: int (*readpage)(struct file *, struct page *); int (*writepages)(struct address_space *, struct writeback_control *); int (*set_page_dirty)(struct page *page); + unsigned (*readahead)(struct file *filp, struct address_space *mapping, + pgoff_t start, unsigned nr_pages); int (*readpages)(struct file *filp, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); int (*write_begin)(struct file *, struct address_space *mapping, @@ -781,6 +783,15 @@ cache in your filesystem. The following members are defined: If defined, it should set the PageDirty flag, and the PAGECACHE_TAG_DIRTY tag in the radix tree. +``readahead`` + called by the VM to read pages associated with the address_space + object. The pages are consecutive in the page cache and are + locked. The implementation should decrement the page refcount after + attempting I/O on each page. Usually the page will be unlocked by + the I/O completion handler. If the function does not attempt I/O on + some pages, return the number of pages which were not read so the + common code can unlock the pages for you. + ``readpages`` called by the VM to read pages associated with the address_space object. This is essentially just a vector version of readpage. diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349adb52..a10f3a72e5ac 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -375,6 +375,8 @@ struct address_space_operations { */ int (*readpages)(struct file *filp, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); + unsigned (*readahead)(struct file *, struct address_space *, + pgoff_t start, unsigned nr_pages); int (*write_begin)(struct file *, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 37a4d9e32cd3..2baafd236a82 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -630,6 +630,18 @@ static inline int add_to_page_cache(struct page *page, return error; } +/* + * Only call this from a ->readahead implementation. + */ +static inline +struct page *readahead_page(struct address_space *mapping, pgoff_t index) +{ + struct page *page = xa_load(&mapping->i_pages, index); + VM_BUG_ON_PAGE(!PageLocked(page), page); + + return page; +} + static inline unsigned long dir_pages(struct inode *inode) { return (unsigned long)(inode->i_size + PAGE_SIZE - 1) >> diff --git a/mm/readahead.c b/mm/readahead.c index 5a6676640f20..6d65dae6dad0 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -121,7 +121,18 @@ static void read_pages(struct address_space *mapping, struct file *filp, blk_start_plug(&plug); - if (mapping->a_ops->readpages) { + if (mapping->a_ops->readahead) { + unsigned left = mapping->a_ops->readahead(filp, mapping, + start, nr_pages); + + while (left) { + struct page *page = readahead_page(mapping, + start + nr_pages - left - 1); + unlock_page(page); + put_page(page); + left--; + } + } else if (mapping->a_ops->readpages) { mapping->a_ops->readpages(filp, mapping, pages, nr_pages); /* Clean up the remaining pages */ put_pages_list(pages);