Message ID | 20220615135850.1961759-1-yebin10@huawei.com |
---|---|
State | Superseded |
Headers | show |
Series | [-next] ext4: fix bug_on in ext4_iomap_begin as race between bmap and write | expand |
On 22/06/15 09:58PM, Ye Bin wrote: > We got issue as follows: > ------------[ cut here ]------------ > WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 > RIP: 0010:ext4_iomap_begin+0x182/0x5d0 > RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 > RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f > R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 > R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee > FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > iomap_apply+0x119/0x570 > iomap_bmap+0x124/0x150 > ext4_bmap+0x14f/0x250 > bmap+0x55/0x80 > do_vfs_ioctl+0x952/0xbd0 > __x64_sys_ioctl+0xc6/0x170 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Above issue may happen as follows: > bmap write > bmap > ext4_bmap > iomap_bmap > ext4_iomap_begin > ext4_file_write_iter > ext4_buffered_write_iter > generic_perform_write > ext4_da_write_begin > ext4_da_write_inline_data_begin > ext4_prepare_inline_data > ext4_create_inline_data > ext4_set_inode_flag(inode, > EXT4_INODE_INLINE_DATA); > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on > > To solved above issue hold inode lock in ext4_bamp. ^^^ ext4_bmap() I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and generic_swapfile_activate() (apart from ioctl()) For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. Same goes with swapfile path as well. However I feel maybe we should hold inode_lock_shared() since there is no block/extent map layout changes that can happen via ext4_bmap(). Hence read lock is what IMO should be used here. -ritesh > > Signed-off-by: Ye Bin <yebin10@huawei.com> > --- > fs/ext4/inode.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 53877ffe3c41..f4a95c80f644 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3142,13 +3142,15 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) > { > struct inode *inode = mapping->host; > journal_t *journal; > + sector_t ret = 0; > int err; > > + inode_lock(inode); > /* > * We can get here for an inline file via the FIBMAP ioctl > */ > if (ext4_has_inline_data(inode)) > - return 0; > + goto out; > > if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) && > test_opt(inode->i_sb, DELALLOC)) { > @@ -3187,10 +3189,14 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) > jbd2_journal_unlock_updates(journal); > > if (err) > - return 0; > + goto out; > } > > - return iomap_bmap(mapping, block, &ext4_iomap_ops); > + ret = iomap_bmap(mapping, block, &ext4_iomap_ops); > + > +out: > + inode_unlock(inode); > + return ret; > } > > static int ext4_read_folio(struct file *file, struct folio *folio) > -- > 2.31.1 >
On 22/06/15 08:51PM, Ritesh Harjani wrote: > On 22/06/15 09:58PM, Ye Bin wrote: > > We got issue as follows: > > ------------[ cut here ]------------ > > WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 > > RIP: 0010:ext4_iomap_begin+0x182/0x5d0 > > RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 > > RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > > RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f > > R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 > > R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee > > FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > iomap_apply+0x119/0x570 > > iomap_bmap+0x124/0x150 > > ext4_bmap+0x14f/0x250 > > bmap+0x55/0x80 > > do_vfs_ioctl+0x952/0xbd0 > > __x64_sys_ioctl+0xc6/0x170 > > do_syscall_64+0x33/0x40 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Above issue may happen as follows: > > bmap write > > bmap > > ext4_bmap > > iomap_bmap > > ext4_iomap_begin > > ext4_file_write_iter > > ext4_buffered_write_iter > > generic_perform_write > > ext4_da_write_begin > > ext4_da_write_inline_data_begin > > ext4_prepare_inline_data > > ext4_create_inline_data > > ext4_set_inode_flag(inode, > > EXT4_INODE_INLINE_DATA); > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on > > > > To solved above issue hold inode lock in ext4_bamp. > ^^^ ext4_bmap() > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and > generic_swapfile_activate() (apart from ioctl()) > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. > Same goes with swapfile path as well. > > However I feel maybe we should hold inode_lock_shared() since there is no > block/extent map layout changes that can happen via ext4_bmap(). > Hence read lock is what IMO should be used here. On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the first place. Should be good to verify it once. -ritesh > > -ritesh > > > > > > Signed-off-by: Ye Bin <yebin10@huawei.com> > > --- > > fs/ext4/inode.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 53877ffe3c41..f4a95c80f644 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -3142,13 +3142,15 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) > > { > > struct inode *inode = mapping->host; > > journal_t *journal; > > + sector_t ret = 0; > > int err; > > > > + inode_lock(inode); > > /* > > * We can get here for an inline file via the FIBMAP ioctl > > */ > > if (ext4_has_inline_data(inode)) > > - return 0; > > + goto out; > > > > if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) && > > test_opt(inode->i_sb, DELALLOC)) { > > @@ -3187,10 +3189,14 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) > > jbd2_journal_unlock_updates(journal); > > > > if (err) > > - return 0; > > + goto out; > > } > > > > - return iomap_bmap(mapping, block, &ext4_iomap_ops); > > + ret = iomap_bmap(mapping, block, &ext4_iomap_ops); > > + > > +out: > > + inode_unlock(inode); > > + return ret; > > } > > > > static int ext4_read_folio(struct file *file, struct folio *folio) > > -- > > 2.31.1 > >
On Wed 15-06-22 21:01:23, Ritesh Harjani wrote: > On 22/06/15 08:51PM, Ritesh Harjani wrote: > > On 22/06/15 09:58PM, Ye Bin wrote: > > > We got issue as follows: > > > ------------[ cut here ]------------ > > > WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 > > > RIP: 0010:ext4_iomap_begin+0x182/0x5d0 > > > RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 > > > RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > > > RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f > > > R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 > > > R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee > > > FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > iomap_apply+0x119/0x570 > > > iomap_bmap+0x124/0x150 > > > ext4_bmap+0x14f/0x250 > > > bmap+0x55/0x80 > > > do_vfs_ioctl+0x952/0xbd0 > > > __x64_sys_ioctl+0xc6/0x170 > > > do_syscall_64+0x33/0x40 > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > Above issue may happen as follows: > > > bmap write > > > bmap > > > ext4_bmap > > > iomap_bmap > > > ext4_iomap_begin > > > ext4_file_write_iter > > > ext4_buffered_write_iter > > > generic_perform_write > > > ext4_da_write_begin > > > ext4_da_write_inline_data_begin > > > ext4_prepare_inline_data > > > ext4_create_inline_data > > > ext4_set_inode_flag(inode, > > > EXT4_INODE_INLINE_DATA); > > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on > > > > > > To solved above issue hold inode lock in ext4_bamp. > > ^^^ ext4_bmap() > > > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and > > generic_swapfile_activate() (apart from ioctl()) > > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock > > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. > > Same goes with swapfile path as well. > > > > However I feel maybe we should hold inode_lock_shared() since there is no > > block/extent map layout changes that can happen via ext4_bmap(). > > Hence read lock is what IMO should be used here. > > On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't > recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the > first place. Should be good to verify it once. Hum, but I guess there's a deeper problem than ext4_bmap(). Generally we have places doing block mapping (such as ext4_writepages(), readahead, or page fault) where we don't hold i_rwsem and racing ext4_create_inline_data() could confuse them? I guess we need to come up with a sound scheme how inline data creation is serialized with these operations (or just decide to remove the inline data feature altogether as we already discussed once because the complexity likely is not worth the gain). Honza
On 2022/6/15 23:21, Ritesh Harjani wrote: > On 22/06/15 09:58PM, Ye Bin wrote: >> We got issue as follows: >> ------------[ cut here ]------------ >> WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 >> RIP: 0010:ext4_iomap_begin+0x182/0x5d0 >> RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 >> RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 >> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 >> RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f >> R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 >> R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee >> FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> iomap_apply+0x119/0x570 >> iomap_bmap+0x124/0x150 >> ext4_bmap+0x14f/0x250 >> bmap+0x55/0x80 >> do_vfs_ioctl+0x952/0xbd0 >> __x64_sys_ioctl+0xc6/0x170 >> do_syscall_64+0x33/0x40 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> Above issue may happen as follows: >> bmap write >> bmap >> ext4_bmap >> iomap_bmap >> ext4_iomap_begin >> ext4_file_write_iter >> ext4_buffered_write_iter >> generic_perform_write >> ext4_da_write_begin >> ext4_da_write_inline_data_begin >> ext4_prepare_inline_data >> ext4_create_inline_data >> ext4_set_inode_flag(inode, >> EXT4_INODE_INLINE_DATA); >> if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on >> >> To solved above issue hold inode lock in ext4_bamp. > ^^^ ext4_bmap() > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and > generic_swapfile_activate() (apart from ioctl()) > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. > Same goes with swapfile path as well. > > However I feel maybe we should hold inode_lock_shared() since there is no > block/extent map layout changes that can happen via ext4_bmap(). > Hence read lock is what IMO should be used here. > > -ritesh Thank you for your advice. > >> Signed-off-by: Ye Bin <yebin10@huawei.com> >> --- >> fs/ext4/inode.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 53877ffe3c41..f4a95c80f644 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -3142,13 +3142,15 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) >> { >> struct inode *inode = mapping->host; >> journal_t *journal; >> + sector_t ret = 0; >> int err; >> >> + inode_lock(inode); >> /* >> * We can get here for an inline file via the FIBMAP ioctl >> */ >> if (ext4_has_inline_data(inode)) >> - return 0; >> + goto out; >> >> if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) && >> test_opt(inode->i_sb, DELALLOC)) { >> @@ -3187,10 +3189,14 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) >> jbd2_journal_unlock_updates(journal); >> >> if (err) >> - return 0; >> + goto out; >> } >> >> - return iomap_bmap(mapping, block, &ext4_iomap_ops); >> + ret = iomap_bmap(mapping, block, &ext4_iomap_ops); >> + >> +out: >> + inode_unlock(inode); >> + return ret; >> } >> >> static int ext4_read_folio(struct file *file, struct folio *folio) >> -- >> 2.31.1 >> > . >
On 2022/6/16 1:26, Jan Kara wrote: > On Wed 15-06-22 21:01:23, Ritesh Harjani wrote: >> On 22/06/15 08:51PM, Ritesh Harjani wrote: >>> On 22/06/15 09:58PM, Ye Bin wrote: >>>> We got issue as follows: >>>> ------------[ cut here ]------------ >>>> WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 >>>> RIP: 0010:ext4_iomap_begin+0x182/0x5d0 >>>> RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 >>>> RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 >>>> RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f >>>> R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 >>>> R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee >>>> FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Call Trace: >>>> iomap_apply+0x119/0x570 >>>> iomap_bmap+0x124/0x150 >>>> ext4_bmap+0x14f/0x250 >>>> bmap+0x55/0x80 >>>> do_vfs_ioctl+0x952/0xbd0 >>>> __x64_sys_ioctl+0xc6/0x170 >>>> do_syscall_64+0x33/0x40 >>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> >>>> Above issue may happen as follows: >>>> bmap write >>>> bmap >>>> ext4_bmap >>>> iomap_bmap >>>> ext4_iomap_begin >>>> ext4_file_write_iter >>>> ext4_buffered_write_iter >>>> generic_perform_write >>>> ext4_da_write_begin >>>> ext4_da_write_inline_data_begin >>>> ext4_prepare_inline_data >>>> ext4_create_inline_data >>>> ext4_set_inode_flag(inode, >>>> EXT4_INODE_INLINE_DATA); >>>> if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on >>>> >>>> To solved above issue hold inode lock in ext4_bamp. >>> ^^^ ext4_bmap() >>> >>> I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and >>> generic_swapfile_activate() (apart from ioctl()) >>> For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock >>> of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. >>> Same goes with swapfile path as well. >>> >>> However I feel maybe we should hold inode_lock_shared() since there is no >>> block/extent map layout changes that can happen via ext4_bmap(). >>> Hence read lock is what IMO should be used here. >> On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't >> recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the >> first place. Should be good to verify it once. > Hum, but I guess there's a deeper problem than ext4_bmap(). Generally we > have places doing block mapping (such as ext4_writepages(), readahead, or > page fault) where we don't hold i_rwsem and racing > ext4_create_inline_data() could confuse them? I guess we need to come up > with a sound scheme how inline data creation is serialized with these > operations (or just decide to remove the inline data feature altogether as > we already discussed once because the complexity likely is not worth the > gain). > > Honza Indeed, this feature has various concurrency problems. At present, there is no scenario using this feature in the actual production environment. However, various problems in the code are easy to be exploited by attackers if they are not solved. Do I fix this single point problem or remove inline data feature?
On 22/06/15 07:26PM, Jan Kara wrote: > On Wed 15-06-22 21:01:23, Ritesh Harjani wrote: > > On 22/06/15 08:51PM, Ritesh Harjani wrote: > > > On 22/06/15 09:58PM, Ye Bin wrote: > > > > We got issue as follows: > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 > > > > RIP: 0010:ext4_iomap_begin+0x182/0x5d0 > > > > RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 > > > > RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > > > > RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f > > > > R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 > > > > R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee > > > > FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > Call Trace: > > > > iomap_apply+0x119/0x570 > > > > iomap_bmap+0x124/0x150 > > > > ext4_bmap+0x14f/0x250 > > > > bmap+0x55/0x80 > > > > do_vfs_ioctl+0x952/0xbd0 > > > > __x64_sys_ioctl+0xc6/0x170 > > > > do_syscall_64+0x33/0x40 > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > Above issue may happen as follows: > > > > bmap write > > > > bmap > > > > ext4_bmap > > > > iomap_bmap > > > > ext4_iomap_begin > > > > ext4_file_write_iter > > > > ext4_buffered_write_iter > > > > generic_perform_write > > > > ext4_da_write_begin > > > > ext4_da_write_inline_data_begin > > > > ext4_prepare_inline_data > > > > ext4_create_inline_data > > > > ext4_set_inode_flag(inode, > > > > EXT4_INODE_INLINE_DATA); > > > > if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on Actually it's only a WARN_ON_ONCE and not a bug_on. (You might have made panic_on_warn set to 1 in your testing) > > > > > > > > To solved above issue hold inode lock in ext4_bamp. > > > ^^^ ext4_bmap() > > > > > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and > > > generic_swapfile_activate() (apart from ioctl()) > > > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock > > > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. > > > Same goes with swapfile path as well. > > > > > > However I feel maybe we should hold inode_lock_shared() since there is no > > > block/extent map layout changes that can happen via ext4_bmap(). > > > Hence read lock is what IMO should be used here. > > > > On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't > > recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the > > first place. Should be good to verify it once. > > Hum, but I guess there's a deeper problem than ext4_bmap(). Generally we > have places doing block mapping (such as ext4_writepages(), readahead, or > page fault) where we don't hold i_rwsem and racing > ext4_create_inline_data() could confuse them? I guess we need to come up You are right, i_rwsem won't be able to protect against such races which you described. So, we actually use EXT4_I(inode)->xattr_sem for inline data serialization. So for this issue, I think if we should move from ext4_iomap_ops to ext4_iomap_report_ops. ext4_iomap_begin_report does takes care of read locking xattr_sem to properly report if it's a inline_data and similarly iomap_bmap reports 0 (which it should) in case of iomap->type != IOMAP_MAPPED (since in this case ext4_iomap_begin_report() will give IOMAP_INLINE) Thoughts? -ritesh > with a sound scheme how inline data creation is serialized with these > operations (or just decide to remove the inline data feature altogether as > we already discussed once because the complexity likely is not worth the > gain). > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Thu 16-06-22 12:01:00, Ritesh Harjani wrote: > On 22/06/15 07:26PM, Jan Kara wrote: > > On Wed 15-06-22 21:01:23, Ritesh Harjani wrote: > > > > > > > > > > To solved above issue hold inode lock in ext4_bamp. > > > > ^^^ ext4_bmap() > > > > > > > > I checked the paths where bmap() kernel api can be called i.e. from jbd2/fc and > > > > generic_swapfile_activate() (apart from ioctl()) > > > > For jbd2, it will be called with j_inode within bmap(), hence taking a inode lock > > > > of the inode passed within ext4_bmap() (j_inode in this case) should be safe here. > > > > Same goes with swapfile path as well. > > > > > > > > However I feel maybe we should hold inode_lock_shared() since there is no > > > > block/extent map layout changes that can happen via ext4_bmap(). > > > > Hence read lock is what IMO should be used here. > > > > > > On second thoughts, shoudn't we use ext4_iomap_report_ops here? Can't > > > recollect why we didn't use ext4_iomap_report_ops for iomap_bmap() in the > > > first place. Should be good to verify it once. > > > > Hum, but I guess there's a deeper problem than ext4_bmap(). Generally we > > have places doing block mapping (such as ext4_writepages(), readahead, or > > page fault) where we don't hold i_rwsem and racing > > ext4_create_inline_data() could confuse them? I guess we need to come up > > You are right, i_rwsem won't be able to protect against such races which you > described. So, we actually use EXT4_I(inode)->xattr_sem for inline data > serialization. Yes, and that is a problem. Because all the places checking for ext4_has_inline_data() would have to be protected by xattr_sem (unless they are already protected by i_rwsem) to make sure we cannot race with inline data creation which is just unworkable both for performance and I suspect also lock ordering reasons. > So for this issue, I think if we should move from ext4_iomap_ops to > ext4_iomap_report_ops. ext4_iomap_begin_report does takes care of read locking > xattr_sem to properly report if it's a inline_data and similarly iomap_bmap > reports 0 (which it should) in case of iomap->type != IOMAP_MAPPED > (since in this case ext4_iomap_begin_report() will give IOMAP_INLINE) I don't think the switch to ext4_iomap_report_ops is really correct. We use ext4_iomap_ops mostly for direct IO and if inline data gets there, there's indeed a deeper problem. Specifically for ext4_bmap() using i_rwsem may be acceptable solution but I'm generally against sprinkling locks here and there without good general locking design how exactly are inline data operations intended to be synchronized with stuff as writeback, readahead etc. Because as I wrote above xattr_sem is not really a working answer... Honza
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 53877ffe3c41..f4a95c80f644 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3142,13 +3142,15 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) { struct inode *inode = mapping->host; journal_t *journal; + sector_t ret = 0; int err; + inode_lock(inode); /* * We can get here for an inline file via the FIBMAP ioctl */ if (ext4_has_inline_data(inode)) - return 0; + goto out; if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) && test_opt(inode->i_sb, DELALLOC)) { @@ -3187,10 +3189,14 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block) jbd2_journal_unlock_updates(journal); if (err) - return 0; + goto out; } - return iomap_bmap(mapping, block, &ext4_iomap_ops); + ret = iomap_bmap(mapping, block, &ext4_iomap_ops); + +out: + inode_unlock(inode); + return ret; } static int ext4_read_folio(struct file *file, struct folio *folio)
We got issue as follows: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 RIP: 0010:ext4_iomap_begin+0x182/0x5d0 RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iomap_apply+0x119/0x570 iomap_bmap+0x124/0x150 ext4_bmap+0x14f/0x250 bmap+0x55/0x80 do_vfs_ioctl+0x952/0xbd0 __x64_sys_ioctl+0xc6/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Above issue may happen as follows: bmap write bmap ext4_bmap iomap_bmap ext4_iomap_begin ext4_file_write_iter ext4_buffered_write_iter generic_perform_write ext4_da_write_begin ext4_da_write_inline_data_begin ext4_prepare_inline_data ext4_create_inline_data ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on To solved above issue hold inode lock in ext4_bamp. Signed-off-by: Ye Bin <yebin10@huawei.com> --- fs/ext4/inode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)