Message ID | 20120103013152.GA26455@dztty |
---|---|
State | New, archived |
Headers | show |
Hello, On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > this can lead to a race with the other operations that update the same > inode. > > Patch tested. Thanks for the patch but I don't quite understand the problem. i_generation is set when: a) inode is loaded from disk b) inode is allocated c) in SETVERSION ioctl The only thing that can race here seems to be c) against c) and that is racy with i_mutex as well. So what problems do you exactly observe without the patch? Honza > Signed-off-by: Djalal Harouni <tixxdz@opendz.org> > --- > fs/ext3/ioctl.c | 6 +++++- > fs/ext4/ioctl.c | 6 +++++- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c > index ba1b54e..e7b2ed9 100644 > --- a/fs/ext3/ioctl.c > +++ b/fs/ext3/ioctl.c > @@ -134,10 +134,11 @@ flags_out: > goto setversion_out; > } > > + mutex_lock(&inode->i_mutex); > handle = ext3_journal_start(inode, 1); > if (IS_ERR(handle)) { > err = PTR_ERR(handle); > - goto setversion_out; > + goto unlock_out; > } > err = ext3_reserve_inode_write(handle, inode, &iloc); > if (err == 0) { > @@ -146,6 +147,9 @@ flags_out: > err = ext3_mark_iloc_dirty(handle, inode, &iloc); > } > ext3_journal_stop(handle); > + > +unlock_out: > + mutex_unlock(&inode->i_mutex); > setversion_out: > mnt_drop_write(filp->f_path.mnt); > return err; > diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c > index a567968..46a8de6 100644 > --- a/fs/ext4/ioctl.c > +++ b/fs/ext4/ioctl.c > @@ -158,10 +158,11 @@ flags_out: > goto setversion_out; > } > > + mutex_lock(&inode->i_mutex); > handle = ext4_journal_start(inode, 1); > if (IS_ERR(handle)) { > err = PTR_ERR(handle); > - goto setversion_out; > + goto unlock_out; > } > err = ext4_reserve_inode_write(handle, inode, &iloc); > if (err == 0) { > @@ -170,6 +171,9 @@ flags_out: > err = ext4_mark_iloc_dirty(handle, inode, &iloc); > } > ext4_journal_stop(handle); > + > +unlock_out: > + mutex_unlock(&inode->i_mutex); > setversion_out: > mnt_drop_write(filp->f_path.mnt); > return err; > -- > 1.7.1
On Tue, Jan 03, 2012 at 01:46:24PM +0100, Jan Kara wrote: > Hello, > > On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > > > > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > > this can lead to a race with the other operations that update the same > > inode. > > > > Patch tested. > Thanks for the patch but I don't quite understand the problem. > i_generation is set when: > a) inode is loaded from disk > b) inode is allocated > c) in SETVERSION ioctl > > The only thing that can race here seems to be c) against c) and that is > racy with i_mutex as well. So what problems do you exactly observe without > the patch? Right, but what about the related i_ctime change ? (i_ctime is updated in other places...) The i_ctime update must reflect the _appropriate_ inode modification operation. This is why IMHO we should protect them to avoid a lost update. BTW the i_generation which is used by NFS and fuse filesystems is updated even if the inode is marked immutable, is this the intended behaviour? > Honza Thanks for your response.
On Wed 04-01-12 00:14:32, Djalal Harouni wrote: > On Tue, Jan 03, 2012 at 01:46:24PM +0100, Jan Kara wrote: > > On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > > > > > > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > > > this can lead to a race with the other operations that update the same > > > inode. > > > > > > Patch tested. > > Thanks for the patch but I don't quite understand the problem. > > i_generation is set when: > > a) inode is loaded from disk > > b) inode is allocated > > c) in SETVERSION ioctl > > > > The only thing that can race here seems to be c) against c) and that is > > racy with i_mutex as well. So what problems do you exactly observe without > > the patch? > Right, but what about the related i_ctime change ? (i_ctime is updated in > other places...) > > The i_ctime update must reflect the _appropriate_ inode modification > operation. This is why IMHO we should protect them to avoid a lost update. Yes, but realistically even if we race with someone else updating i_ctime, the times will be rather similar so there's hardly a real difference. Anyway, using i_mutex is consistent with how we handle permission changes or timestamp changes and the ioctl isn't performance critical so I'll take your patch. I was just wondering whether you observed a real problem or whether it's more or less a theoretical concern. > BTW the i_generation which is used by NFS and fuse filesystems is updated > even if the inode is marked immutable, is this the intended behaviour? Well, I'd say it's unexpected that generation can be changed for immutable inode so I'd be inclined to take a patch that would make ext3 refuse to do that. But it's a change in how the ioctl behaves so I'd like to hear opinion of Ted or Andrew on this. Honza
On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > this can lead to a race with the other operations that update the same > inode. > > Patch tested. OK, so I've taken the patch into my tree, just updated the changelog which result of our discussion in this thread. I also took the ext4 part since there is no risk of conflict and the patch looks obvious. Honza > Signed-off-by: Djalal Harouni <tixxdz@opendz.org> > --- > fs/ext3/ioctl.c | 6 +++++- > fs/ext4/ioctl.c | 6 +++++- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c > index ba1b54e..e7b2ed9 100644 > --- a/fs/ext3/ioctl.c > +++ b/fs/ext3/ioctl.c > @@ -134,10 +134,11 @@ flags_out: > goto setversion_out; > } > > + mutex_lock(&inode->i_mutex); > handle = ext3_journal_start(inode, 1); > if (IS_ERR(handle)) { > err = PTR_ERR(handle); > - goto setversion_out; > + goto unlock_out; > } > err = ext3_reserve_inode_write(handle, inode, &iloc); > if (err == 0) { > @@ -146,6 +147,9 @@ flags_out: > err = ext3_mark_iloc_dirty(handle, inode, &iloc); > } > ext3_journal_stop(handle); > + > +unlock_out: > + mutex_unlock(&inode->i_mutex); > setversion_out: > mnt_drop_write(filp->f_path.mnt); > return err; > diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c > index a567968..46a8de6 100644 > --- a/fs/ext4/ioctl.c > +++ b/fs/ext4/ioctl.c > @@ -158,10 +158,11 @@ flags_out: > goto setversion_out; > } > > + mutex_lock(&inode->i_mutex); > handle = ext4_journal_start(inode, 1); > if (IS_ERR(handle)) { > err = PTR_ERR(handle); > - goto setversion_out; > + goto unlock_out; > } > err = ext4_reserve_inode_write(handle, inode, &iloc); > if (err == 0) { > @@ -170,6 +171,9 @@ flags_out: > err = ext4_mark_iloc_dirty(handle, inode, &iloc); > } > ext4_journal_stop(handle); > + > +unlock_out: > + mutex_unlock(&inode->i_mutex); > setversion_out: > mnt_drop_write(filp->f_path.mnt); > return err; > -- > 1.7.1
On 2012-01-04, at 10:46 AM, Jan Kara wrote: > On Tue 03-01-12 02:31:52, Djalal Harouni wrote: >> >> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, >> this can lead to a race with the other operations that update the same >> inode. >> >> Patch tested. > > OK, so I've taken the patch into my tree, just updated the changelog > which result of our discussion in this thread. I also took the ext4 part > since there is no risk of conflict and the patch looks obvious. Actually, I'd like to hear more about whether this is a real problem, or if it is just a theoretical problem found during code inspection or from some static code analysis tool? With the metadata checksum feature we were discussing using the inode generation as part of the seed for the directory leaf block checksum, so that it wasn't possible to incorrectly access stale directory blocks from a previous incarnation of the same inode number. We were discussing just disabling this ioctl on filesystems with metadata checksums, and printing a deprecation warning for filesystems without that feature enabled. I'm not aware of any real-world use for this ioctl, since NFS cannot use it to reconstruct handles because there's no API to allocate an inode with a specific number, so setting the generation is pointless. This ioctl (despite its confusing name) is completely different from the NFSv4 inode version, which is stored in i_version (and i_version_hi). Cheers, Andreas >> Signed-off-by: Djalal Harouni <tixxdz@opendz.org> >> --- >> fs/ext3/ioctl.c | 6 +++++- >> fs/ext4/ioctl.c | 6 +++++- >> 2 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c >> index ba1b54e..e7b2ed9 100644 >> --- a/fs/ext3/ioctl.c >> +++ b/fs/ext3/ioctl.c >> @@ -134,10 +134,11 @@ flags_out: >> goto setversion_out; >> } >> >> + mutex_lock(&inode->i_mutex); >> handle = ext3_journal_start(inode, 1); >> if (IS_ERR(handle)) { >> err = PTR_ERR(handle); >> - goto setversion_out; >> + goto unlock_out; >> } >> err = ext3_reserve_inode_write(handle, inode, &iloc); >> if (err == 0) { >> @@ -146,6 +147,9 @@ flags_out: >> err = ext3_mark_iloc_dirty(handle, inode, &iloc); >> } >> ext3_journal_stop(handle); >> + >> +unlock_out: >> + mutex_unlock(&inode->i_mutex); >> setversion_out: >> mnt_drop_write(filp->f_path.mnt); >> return err; >> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c >> index a567968..46a8de6 100644 >> --- a/fs/ext4/ioctl.c >> +++ b/fs/ext4/ioctl.c >> @@ -158,10 +158,11 @@ flags_out: >> goto setversion_out; >> } >> >> + mutex_lock(&inode->i_mutex); >> handle = ext4_journal_start(inode, 1); >> if (IS_ERR(handle)) { >> err = PTR_ERR(handle); >> - goto setversion_out; >> + goto unlock_out; >> } >> err = ext4_reserve_inode_write(handle, inode, &iloc); >> if (err == 0) { >> @@ -170,6 +171,9 @@ flags_out: >> err = ext4_mark_iloc_dirty(handle, inode, &iloc); >> } >> ext4_journal_stop(handle); >> + >> +unlock_out: >> + mutex_unlock(&inode->i_mutex); >> setversion_out: >> mnt_drop_write(filp->f_path.mnt); >> return err; >> -- >> 1.7.1 > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 04-01-12 16:15:04, Andreas Dilger wrote: > On 2012-01-04, at 10:46 AM, Jan Kara wrote: > > On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > >> > >> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > >> this can lead to a race with the other operations that update the same > >> inode. > >> > >> Patch tested. > > > > OK, so I've taken the patch into my tree, just updated the changelog > > which result of our discussion in this thread. I also took the ext4 part > > since there is no risk of conflict and the patch looks obvious. > > Actually, I'd like to hear more about whether this is a real problem, or > if it is just a theoretical problem found during code inspection or from > some static code analysis tool? As far as I understood that was just a theoretical issue and I applied the patch just on the grounds that it is more consistent to use i_mutex for i_generation changes. > With the metadata checksum feature we were discussing using the inode > generation as part of the seed for the directory leaf block checksum, so > that it wasn't possible to incorrectly access stale directory blocks from > a previous incarnation of the same inode number. > > We were discussing just disabling this ioctl on filesystems with metadata > checksums, and printing a deprecation warning for filesystems without that > feature enabled. I'm not aware of any real-world use for this ioctl, since > NFS cannot use it to reconstruct handles because there's no API to allocate > an inode with a specific number, so setting the generation is pointless. OK, I didn't know this. I'm fine with deprecating the ioctl if it's useless but since that's going to take a while I think the cleanup still makes some sense. Honza
On 2012-01-04, at 4:32 PM, Jan Kara wrote: > On Wed 04-01-12 16:15:04, Andreas Dilger wrote: >> On 2012-01-04, at 10:46 AM, Jan Kara wrote: >>> On Tue 03-01-12 02:31:52, Djalal Harouni wrote: >>>> >>>> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, >>>> this can lead to a race with the other operations that update the same >>>> inode. >>>> >>>> Patch tested. >>> >>> OK, so I've taken the patch into my tree, just updated the changelog >>> which result of our discussion in this thread. I also took the ext4 part >>> since there is no risk of conflict and the patch looks obvious. >> >> Actually, I'd like to hear more about whether this is a real problem, or >> if it is just a theoretical problem found during code inspection or from >> some static code analysis tool? > > As far as I understood that was just a theoretical issue and I applied > the patch just on the grounds that it is more consistent to use i_mutex for > i_generation changes. > >> With the metadata checksum feature we were discussing using the inode >> generation as part of the seed for the directory leaf block checksum, so >> that it wasn't possible to incorrectly access stale directory blocks from >> a previous incarnation of the same inode number. >> >> We were discussing just disabling this ioctl on filesystems with metadata >> checksums, and printing a deprecation warning for filesystems without that >> feature enabled. I'm not aware of any real-world use for this ioctl, since >> NFS cannot use it to reconstruct handles because there's no API to allocate >> an inode with a specific number, so setting the generation is pointless. > > OK, I didn't know this. I'm fine with deprecating the ioctl if it's > useless but since that's going to take a while I think the cleanup still > makes some sense. I'm not against landing the patch, and I agree that there is no question about the performance impact of making this ioctl safe. My real question was whether there was a real-world use for this ioctl which might prevent it from being deprecated. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 05, 2012 at 12:32:54AM +0100, Jan Kara wrote: > On Wed 04-01-12 16:15:04, Andreas Dilger wrote: > > On 2012-01-04, at 10:46 AM, Jan Kara wrote: > > > On Tue 03-01-12 02:31:52, Djalal Harouni wrote: > > >> > > >> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, > > >> this can lead to a race with the other operations that update the same > > >> inode. > > >> > > >> Patch tested. > > > > > > OK, so I've taken the patch into my tree, just updated the changelog > > > which result of our discussion in this thread. I also took the ext4 part > > > since there is no risk of conflict and the patch looks obvious. > > > > Actually, I'd like to hear more about whether this is a real problem, or > > if it is just a theoretical problem found during code inspection or from > > some static code analysis tool? > As far as I understood that was just a theoretical issue and I applied > the patch just on the grounds that it is more consistent to use i_mutex for > i_generation changes. This was found using a static code analysis tool (currently a PoC) which is a part of a research project at our university. And later, code inspection confirms that i_ctime updates can be disturbed. I should have specified this. Sorry. > > With the metadata checksum feature we were discussing using the inode > > generation as part of the seed for the directory leaf block checksum, so > > that it wasn't possible to incorrectly access stale directory blocks from > > a previous incarnation of the same inode number. > > > > We were discussing just disabling this ioctl on filesystems with metadata > > checksums, and printing a deprecation warning for filesystems without that > > feature enabled. I'm not aware of any real-world use for this ioctl, since > > NFS cannot use it to reconstruct handles because there's no API to allocate > > an inode with a specific number, so setting the generation is pointless. > OK, I didn't know this. I'm fine with deprecating the ioctl if it's > useless but since that's going to take a while I think the cleanup still > makes some sense. Actually I've grepped this ioctl but did not found use cases, but as ext{3,2} also support it, I did not say anything (this is old, there is even the EXT4_IOC_SETVERSION_OLD ioctl ?). I don't know if this ioctl is used or not. Only the reiserfs and ext{2,3,4} filesystems support this ioctl. The reiserfs do not use mutexes at all, even in the REISERFS_IOC_SETFLAGS ioctl which will test and set _all_ the possible values of the i_flags field. Perhaps I should also send a patch for this ? And perhaps ext2 should also be updated. > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR Thanks for the feedback.
On Thu 05-01-12 01:40:09, Djalal Harouni wrote: > On Thu, Jan 05, 2012 at 12:32:54AM +0100, Jan Kara wrote: > > > With the metadata checksum feature we were discussing using the inode > > > generation as part of the seed for the directory leaf block checksum, so > > > that it wasn't possible to incorrectly access stale directory blocks from > > > a previous incarnation of the same inode number. > > > > > > We were discussing just disabling this ioctl on filesystems with metadata > > > checksums, and printing a deprecation warning for filesystems without that > > > feature enabled. I'm not aware of any real-world use for this ioctl, since > > > NFS cannot use it to reconstruct handles because there's no API to allocate > > > an inode with a specific number, so setting the generation is pointless. > > OK, I didn't know this. I'm fine with deprecating the ioctl if it's > > useless but since that's going to take a while I think the cleanup still > > makes some sense. > Actually I've grepped this ioctl but did not found use cases, but as > ext{3,2} also support it, I did not say anything (this is old, there is > even the EXT4_IOC_SETVERSION_OLD ioctl ?). I don't know if this ioctl is > used or not. > > Only the reiserfs and ext{2,3,4} filesystems support this ioctl. The reiserfs > do not use mutexes at all, even in the REISERFS_IOC_SETFLAGS ioctl which will > test and set _all_ the possible values of the i_flags field. > Perhaps I should also send a patch for this ? Yes, possibly reiserfs should use i_mutex for that ioctl. > And perhaps ext2 should also be updated. Sure. Send a patch my way when you have it. Honza
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index ba1b54e..e7b2ed9 100644 --- a/fs/ext3/ioctl.c +++ b/fs/ext3/ioctl.c @@ -134,10 +134,11 @@ flags_out: goto setversion_out; } + mutex_lock(&inode->i_mutex); handle = ext3_journal_start(inode, 1); if (IS_ERR(handle)) { err = PTR_ERR(handle); - goto setversion_out; + goto unlock_out; } err = ext3_reserve_inode_write(handle, inode, &iloc); if (err == 0) { @@ -146,6 +147,9 @@ flags_out: err = ext3_mark_iloc_dirty(handle, inode, &iloc); } ext3_journal_stop(handle); + +unlock_out: + mutex_unlock(&inode->i_mutex); setversion_out: mnt_drop_write(filp->f_path.mnt); return err; diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index a567968..46a8de6 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -158,10 +158,11 @@ flags_out: goto setversion_out; } + mutex_lock(&inode->i_mutex); handle = ext4_journal_start(inode, 1); if (IS_ERR(handle)) { err = PTR_ERR(handle); - goto setversion_out; + goto unlock_out; } err = ext4_reserve_inode_write(handle, inode, &iloc); if (err == 0) { @@ -170,6 +171,9 @@ flags_out: err = ext4_mark_iloc_dirty(handle, inode, &iloc); } ext4_journal_stop(handle); + +unlock_out: + mutex_unlock(&inode->i_mutex); setversion_out: mnt_drop_write(filp->f_path.mnt); return err;
The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex, this can lead to a race with the other operations that update the same inode. Patch tested. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> --- fs/ext3/ioctl.c | 6 +++++- fs/ext4/ioctl.c | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-)