mbox series

[v4,00/12] Change readahead API

Message ID 20200201151240.24082-1-willy@infradead.org
Headers show
Series Change readahead API | expand

Message

Matthew Wilcox Feb. 1, 2020, 3:12 p.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

I would particularly value feedback on this from the gfs2 and ocfs2
maintainers.  They have non-trivial changes, and a review on patch 5
would be greatly appreciated.

This series adds a readahead address_space operation to eventually
replace the readpages operation.  The key difference is that
pages are added to the page cache as they are allocated (and
then looked up by the filesystem) instead of passing them on a
list to the readpages operation and having the filesystem add
them to the page cache.  It's a net reduction in code for each
implementation, more efficient than walking a list, and solves
the direct-write vs buffered-read problem reported by yu kuai at
https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/

v4:
 - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'"))
 - Add comment to __do_page_cache_readahead() acknowledging we don't
   care _that_ much about setting PageReadahead.
 - Fix the return value check of add_to_page_cache_lru().
 - Add a missing call to put_page() in __do_page_cache_readahead() if
   we fail to insert the page.
 - Improve the documentation of ->readahead (including indentation
   problem identified by Randy).
 - Fix off by one error in read_pages() (Dave Chinner).
 - Fix nr_pages manipulation in btrfs (Dave Chinner).
 - Remove bogus refcount fix in erofs (Gao Xiang, Dave Chinner).
 - Update ext4 patch for Merkle tree readahead.
 - Update f2fs patch for Merkle tree readahead.
 - Reinstate next_page label in f2fs_readpages() now it's used by the
   compression code.
 - Reinstate call to fuse_wait_on_page_writeback (Miklos Szeredi).
 - Remove a double-unlock in the error path in fuse.
 - Remove an odd fly-speck in fuse_readpages().
 - Make nr_pages loop in fuse_readpages less convoluted (Dave Chinner).

Matthew Wilcox (Oracle) (12):
  mm: Fix the return type of __do_page_cache_readahead
  readahead: Ignore return value of ->readpages
  readahead: Put pages in cache earlier
  mm: Add readahead address space operation
  fs: Convert mpage_readpages to mpage_readahead
  btrfs: Convert from readpages to readahead
  erofs: Convert uncompressed files from readpages to readahead
  erofs: Convert compressed files from readpages to readahead
  ext4: Convert from readpages to readahead
  f2fs: Convert from readpages to readahead
  fuse: Convert from readpages to readahead
  iomap: Convert from readpages to readahead

 Documentation/filesystems/locking.rst |  7 ++-
 Documentation/filesystems/vfs.rst     | 14 +++++
 drivers/staging/exfat/exfat_super.c   |  9 +--
 fs/block_dev.c                        |  9 +--
 fs/btrfs/extent_io.c                  | 19 +++---
 fs/btrfs/extent_io.h                  |  2 +-
 fs/btrfs/inode.c                      | 18 +++---
 fs/erofs/data.c                       | 33 ++++------
 fs/erofs/zdata.c                      | 21 +++----
 fs/ext2/inode.c                       | 12 ++--
 fs/ext4/ext4.h                        |  5 +-
 fs/ext4/inode.c                       | 24 ++++----
 fs/ext4/readpage.c                    | 20 +++---
 fs/ext4/verity.c                      | 16 +++--
 fs/f2fs/data.c                        | 35 +++++------
 fs/f2fs/f2fs.h                        |  5 +-
 fs/f2fs/verity.c                      | 16 +++--
 fs/fat/inode.c                        |  8 +--
 fs/fuse/file.c                        | 37 +++++------
 fs/gfs2/aops.c                        | 20 +++---
 fs/hpfs/file.c                        |  8 +--
 fs/iomap/buffered-io.c                | 74 +++++-----------------
 fs/iomap/trace.h                      |  2 +-
 fs/isofs/inode.c                      |  9 +--
 fs/jfs/inode.c                        |  8 +--
 fs/mpage.c                            | 38 ++++--------
 fs/nilfs2/inode.c                     | 13 ++--
 fs/ocfs2/aops.c                       | 32 +++++-----
 fs/omfs/file.c                        |  8 +--
 fs/qnx6/inode.c                       |  8 +--
 fs/reiserfs/inode.c                   | 10 +--
 fs/udf/inode.c                        |  8 +--
 fs/xfs/xfs_aops.c                     | 10 +--
 include/linux/fs.h                    |  2 +
 include/linux/iomap.h                 |  2 +-
 include/linux/mpage.h                 |  2 +-
 include/linux/pagemap.h               | 12 ++++
 include/trace/events/erofs.h          |  6 +-
 include/trace/events/f2fs.h           |  6 +-
 mm/internal.h                         |  2 +-
 mm/migrate.c                          |  2 +-
 mm/readahead.c                        | 89 ++++++++++++++++++---------
 42 files changed, 332 insertions(+), 349 deletions(-)

Comments

David Sterba Feb. 4, 2020, 3:32 p.m. UTC | #1
On Sat, Feb 01, 2020 at 07:12:28AM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> I would particularly value feedback on this from the gfs2 and ocfs2
> maintainers.  They have non-trivial changes, and a review on patch 5
> would be greatly appreciated.
> 
> This series adds a readahead address_space operation to eventually
> replace the readpages operation.  The key difference is that
> pages are added to the page cache as they are allocated (and
> then looked up by the filesystem) instead of passing them on a
> list to the readpages operation and having the filesystem add
> them to the page cache.  It's a net reduction in code for each
> implementation, more efficient than walking a list, and solves
> the direct-write vs buffered-read problem reported by yu kuai at
> https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> 
> v4:
>  - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'"))

I've tried to test the patchset but haven't got very far, it crashes at boot
ritht after VFS mounts the root. The patches are from mailinglist, applied on
current master, bug I saw the same crash with the git branch in your
repo (probably v1).

(gdb) l *(ext4_mpage_readpages+0x1da/0xc20)
0xffffffff813753f0 is in ext4_mpage_readpages (fs/ext4/readpage.c:226).
221             return i_size_read(inode);
222     }
223
224     int ext4_mpage_readpages(struct address_space *mapping, pgoff_t start,
225                     struct page *page, unsigned nr_pages, bool is_readahead)
226     {
227             struct bio *bio = NULL;
228             sector_t last_block_in_bio = 0;
229
230             struct inode *inode = mapping->host;

[    8.008531] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    8.011482] #PF: supervisor read access in kernel mode
[    8.014121] #PF: error_code(0x0000) - not-present page
[    8.016767] PGD 0 P4D 0
[    8.018352] Oops: 0000 [#1] SMP
[    8.019716] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.5.0-default+ #955
[    8.021746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[    8.025244] RIP: 0010:ext4_mpage_readpages+0x1da/0xc20
[    8.026817] Code: 7c 24 4e 00 0f 85 23 04 00 00 44 29 74 24 3c 83 6c 24 48 01 0f 84 4d 04 00 00 80 7c 24 4e 00 0f 85 fc 05 00 00 48 8b 4c 24 18 <48> 8b 01 f6 c4 20 75 89 4c 8b 69 20 b9 0c 00 00 00 2b 4c 24 38 83
[    8.031957] RSP: 0000:ffffb34f40013988 EFLAGS: 00010292
[    8.033691] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    8.035533] RDX: 0000000000000001 RSI: ffffffff960934c0 RDI: ffffffff9681a080
[    8.036900] RBP: 0000000000000001 R08: ffffb34f40013a68 R09: 0000000000000000
[    8.038461] R10: 0000000000000038 R11: 0000000000000000 R12: 0000000000000004
[    8.040698] R13: ffff9668ba4e18e0 R14: 0000000000000001 R15: 0000000000000000
[    8.042805] FS:  0000000000000000(0000) GS:ffff9668bda00000(0000) knlGS:0000000000000000
[    8.045396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.047233] CR2: 0000000000000000 CR3: 000000002e011001 CR4: 0000000000160ee0
[    8.049337] Call Trace:
[    8.050435]  ? __lock_acquire+0xee0/0x1320
[    8.051833]  ? release_pages+0x310/0x380
[    8.053265]  ? mark_held_locks+0x50/0x80
[    8.054468]  ext4_readahead+0x3b/0x50
[    8.055877]  read_pages+0x65/0x1a0
[    8.057167]  ? put_pages_list+0x90/0x90
[    8.058689]  __do_page_cache_readahead+0x24b/0x2a0
[    8.060394]  generic_file_buffered_read+0x7cf/0x9f0
[    8.062137]  ? sched_clock+0x5/0x10
[    8.063451]  ? up_read+0x18/0x240
[    8.064774]  ? ext4_xattr_get+0x97/0x2c0
[    8.066178]  new_sync_read+0x111/0x1a0
[    8.067423]  vfs_read+0xc5/0x180
[    8.068572]  kernel_read+0x2c/0x40
[    8.069788]  prepare_binprm+0x171/0x1b0
[    8.071311]  load_script+0x1c1/0x250
[    8.072643]  search_binary_handler+0x5f/0x210
[    8.074135]  exec_binprm+0xd7/0x290
[    8.075463]  __do_execve_file.isra.0+0x570/0x800
[    8.077400]  ? rest_init+0x2f1/0x2f5
[    8.078979]  do_execve+0x21/0x30
[    8.080420]  kernel_init+0xa4/0x11b
[    8.081856]  ? rest_init+0x2f5/0x2f5
[    8.083173]  ret_from_fork+0x24/0x30
[    8.084695] Modules linked in:
[    8.086055] CR2: 0000000000000000
[    8.087572] ---[ end trace 0890c371a706b34a ]---
[    8.089417] RIP: 0010:ext4_mpage_readpages+0x1da/0xc20
[    8.116836] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38
[    8.119626] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
[    8.122392] INFO: lockdep is turned off.
[    8.123694] irq event stamp: 18341344
[    8.124735] hardirqs last  enabled at (18341343): [<ffffffff95230c42>] free_unref_page_list+0x232/0x270
[    8.127918] hardirqs last disabled at (18341344): [<ffffffff95002b4b>] trace_hardirqs_off_thunk+0x1a/0x1c
[    8.131145] softirqs last  enabled at (18341250): [<ffffffff95a00358>] __do_softirq+0x358/0x52b
[    8.143060] softirqs last disabled at (18341243): [<ffffffff9508ae3d>] irq_exit+0x9d/0xb0
[    8.145603] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G      D           5.5.0-default+ #955
[    8.148474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[    8.152440] Call Trace:
[    8.153747]  dump_stack+0x71/0xa0
[    8.155238]  ___might_sleep.cold+0xa6/0xf9
[    8.156903]  exit_signals+0x31/0x310
[    8.158431]  ? __do_execve_file.isra.0+0x570/0x800
[    8.160179]  do_exit+0xa8/0xd60
[    8.161632]  ? rest_init+0x2f1/0x2f5
[    8.163204]  rewind_stack_do_exit+0x17/0x20
[    8.164931] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    8.167575] Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Matthew Wilcox Feb. 4, 2020, 5:16 p.m. UTC | #2
On Tue, Feb 04, 2020 at 04:32:27PM +0100, David Sterba wrote:
> On Sat, Feb 01, 2020 at 07:12:28AM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > I would particularly value feedback on this from the gfs2 and ocfs2
> > maintainers.  They have non-trivial changes, and a review on patch 5
> > would be greatly appreciated.
> > 
> > This series adds a readahead address_space operation to eventually
> > replace the readpages operation.  The key difference is that
> > pages are added to the page cache as they are allocated (and
> > then looked up by the filesystem) instead of passing them on a
> > list to the readpages operation and having the filesystem add
> > them to the page cache.  It's a net reduction in code for each
> > implementation, more efficient than walking a list, and solves
> > the direct-write vs buffered-read problem reported by yu kuai at
> > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/
> > 
> > v4:
> >  - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'"))
> 
> I've tried to test the patchset but haven't got very far, it crashes at boot
> ritht after VFS mounts the root. The patches are from mailinglist, applied on
> current master, bug I saw the same crash with the git branch in your
> repo (probably v1).

Yeah, I wasn't able to test at the time due to what turned out to be
the hpet bug in Linus' tree.  Now that's fixed, I've found & fixed a
couple more bugs.  There'll be a v5 once I fix the remaining problem
(looks like a missing page unlock somewhere).