Message ID | 20231127062116.2355129-2-yukuai1@huaweicloud.com |
---|---|
State | New |
Headers | show |
Series | block: remove field 'bd_inode' from block_device | expand |
On Mon, Nov 27, 2023 at 02:21:01PM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > block_devcie is allocated from bdev_alloc() by bdev_alloc_inode(), and > currently block_device contains a pointer that point to the address of > inode, while such inode is allocated together: This is going the wrong way. Nothing outside of core block layer code should ever directly use the bdev inode. We've been rather sloppy and added a lot of direct reference to it, but they really need to go away and be replaced with well defined high level operation on struct block_device. Once that is done we can remove the bd_inode pointer, but replacing it with something that pokes even more deeply into bdev internals is a bad idea.
Hi, 在 2023/11/27 15:21, Christoph Hellwig 写道: > On Mon, Nov 27, 2023 at 02:21:01PM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> block_devcie is allocated from bdev_alloc() by bdev_alloc_inode(), and >> currently block_device contains a pointer that point to the address of >> inode, while such inode is allocated together: > > This is going the wrong way. Nothing outside of core block layer code > should ever directly use the bdev inode. We've been rather sloppy > and added a lot of direct reference to it, but they really need to > go away and be replaced with well defined high level operation on > struct block_device. Once that is done we can remove the bd_inode > pointer, but replacing it with something that pokes even more deeply > into bdev internals is a bad idea. Thanks for the advice, however, after collecting how other modules are using bdev inode, I got two main questions: 1) Is't okay to add a new helper to pass in bdev for following apis? If so, then almost all the fs and driver can avoid to access bd_inode dirctly. errseq_check(&bdev->bd_inode->i_mapping->wb_err, wb_err); errseq_check_and_advance(&bdev->bd_inode->i_mapping->wb_err, &wb_err); mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp); i_size_read(bdev->bd_inode) find_get_page(bdev->bd_inode->i_mapping, offset); find_or_create_page(bdev->bd_inode->i_mapping, index, gfp); read_cache_page_gfp(bdev->bd_inode->i_mapping, index, gfp); invalidate_inode_pages2(bdev->bd_inode->i_mapping); invalidate_inode_pages2_range(bdev->bd_inode->i_mapping, start, end); read_mapping_folio(bdev->bd_inode->i_mapping, index, file); read_mapping_page(bdev->bd_inode->i_mapping, index, file); balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping) file_ra_state_init(ra, bdev->bd_inode->i_mapping); page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index, req_count); inode_to_bdi(bdev->bd_inode) 2) For the file fs/buffer.c, there are some special usage like following that I don't think it's good to add a helper: spin_lock(&bd_inode->i_mapping->private_lock); Is't okay to move following apis from fs/buffer.c directly to block/bdev.c? __find_get_block bdev_getblk Thanks, Kuai > . >
On Mon, Nov 27, 2023 at 09:07:22PM +0800, Yu Kuai wrote: > 1) Is't okay to add a new helper to pass in bdev for following apis? For some we already have them (e.g. bdev_nr_bytes to read the bdev) size, for some we need to add them. The big thing that seems to stick out is page cache API, and I think that is where we need to define maintainable APIs for file systems and others to use the block device page cache. Probably only in folio versions and not pages once if we're touching the code anyay > 2) For the file fs/buffer.c, there are some special usage like > following that I don't think it's good to add a helper: > > spin_lock(&bd_inode->i_mapping->private_lock); > > Is't okay to move following apis from fs/buffer.c directly to > block/bdev.c? > > __find_get_block > bdev_getblk I'm not sure moving is a good idea, but we might end up the some kind of low-level access from buffer.c, be that special helpers, a separate header or something else. Let's sort out the rest of the kernel first.
Hi, 在 2023/11/28 0:32, Christoph Hellwig 写道: > On Mon, Nov 27, 2023 at 09:07:22PM +0800, Yu Kuai wrote: >> 1) Is't okay to add a new helper to pass in bdev for following apis? > > > For some we already have them (e.g. bdev_nr_bytes to read the bdev) > size, for some we need to add them. The big thing that seems to > stick out is page cache API, and I think that is where we need to > define maintainable APIs for file systems and others to use the > block device page cache. Probably only in folio versions and not > pages once if we're touching the code anyay Thanks for the advice! In case I'm understanding correctly, do you mean that all other fs/drivers that is using pages versions can safely switch to folio versions now? By the way, my orginal idea was trying to add a new field 'bd_flags' in block_devcie, and then add a new bit so that bio_check_ro() will only warn once for each partition. Now that this patchset will be quite complex, I'll add a new bool field 'bd_ro_warned' to fix the above problem first, and then add 'bd_flags' once this patchset is done. Thanks, Kuai > >> 2) For the file fs/buffer.c, there are some special usage like >> following that I don't think it's good to add a helper: >> >> spin_lock(&bd_inode->i_mapping->private_lock); >> >> Is't okay to move following apis from fs/buffer.c directly to >> block/bdev.c? >> >> __find_get_block >> bdev_getblk > > I'm not sure moving is a good idea, but we might end up the > some kind of low-level access from buffer.c, be that special > helpers, a separate header or something else. Let's sort out > the rest of the kernel first. > > . >
On Tue, Nov 28, 2023 at 09:35:56AM +0800, Yu Kuai wrote: > Thanks for the advice! In case I'm understanding correctly, do you mean > that all other fs/drivers that is using pages versions can safely switch > to folio versions now? If you never allocate a high-order folio pages are identical to folios. So yes, we can do folio based interfaces only, and also use that as an opportunity to convert over the callers. > By the way, my orginal idea was trying to add a new field 'bd_flags' > in block_devcie, and then add a new bit so that bio_check_ro() will > only warn once for each partition. Now that this patchset will be quite > complex, I'll add a new bool field 'bd_ro_warned' to fix the above > problem first, and then add 'bd_flags' once this patchset is done. Yes, please do a minimal version if you can find space where the rmw cycles don't cause damage to neighbouring fields. Or just leave the current set of warnings in if it's too hard.
diff --git a/block/bdev.c b/block/bdev.c index e4cfb7adb645..7509389095b7 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -30,11 +30,6 @@ #include "../fs/internal.h" #include "blk.h" -struct bdev_inode { - struct block_device bdev; - struct inode vfs_inode; -}; - static inline struct bdev_inode *BDEV_I(struct inode *inode) { return container_of(inode, struct bdev_inode, vfs_inode); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d5c5e59ddbd2..06de8393dcd1 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -85,6 +85,18 @@ struct block_device { #define bdev_kobj(_bdev) \ (&((_bdev)->bd_device.kobj)) +struct bdev_inode { + struct block_device bdev; + struct inode vfs_inode; +}; + +static inline struct inode *bdev_inode(struct block_device *bdev) +{ + struct bdev_inode *bi = container_of(bdev, struct bdev_inode, bdev); + + return &bi->vfs_inode; +} + /* * Block error status values. See block/blk-core:blk_errors for the details. * Alpha cannot write a byte atomically, so we need to use 32-bit value.