Message ID | e2d845a3-a863-1506-a0de-576049c84e81@nod.at |
---|---|
State | Not Applicable |
Headers | show |
Hi Richard, On Mon, May 15, 2017 at 04:39:23PM +0200, Richard Weinberger wrote: > Hi! > > on UBIFS, fstest generic/429 fails due to -ENFILE because the internal orphan > list reaches the maximum size. > When you unlink a file, the inode goes into the orphan list, in UBIFS' evict() function > it is being removed later. So, only unlinked but used inodes should stay in that list. > > If a directory is encrypted, evict() is not being called although the inode has no > users anymore. > It turned out evict() is not being called because fscrypt's fscrypt_d_revalidate() > function. > When I omit fscrypt_set_d_op() in UBIFS code, the test works just fine. Well, I assume you mean the t_encrypted_d_revalidate portion of the test. generic/429 will still fail overall if you remove fscrypt_set_d_op() --- which is expected, since it's testing dentry revalidation after all. > > Can it be that fscrypt_d_revalidate() misses the case of i_nlink being 0? > It seem to treat unlinked inodes as already gone and they stay. > > The following change makes the problem go away here: > > diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c > index 6d6eca394d4d..d0c19838e513 100644 > --- a/fs/crypto/crypto.c > +++ b/fs/crypto/crypto.c > @@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page); > static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) > { > struct dentry *dir; > + struct inode *inode = d_inode(dentry); > int dir_has_key, cached_with_key; > > if (flags & LOOKUP_RCU) > @@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) > (!cached_with_key && dir_has_key) || > (cached_with_key && !dir_has_key)) > return 0; > + > + if (!inode || inode->i_nlink == 0) > + return 0; > + > return 1; > } > > Does this change make sense? TBH, I'm not really an expert in this area and it is also > not clear to me why you don't see these issue on ext4 or f2fs. > Maybe UBIFS' limitations kick in much earlier. ;-) The test is repeatedly creating and removing a directory "dir" while lookups are being done in it. It seems the problem is that many dentries are being created for "dir", and they pin many different inodes, all at the same time. This actually happens for ext4 too; it just doesn't cause an observable error. I doubt it's the right solution to make fscrypt_d_revalidate() look at ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the inode. I think there is probably a VFS bug that is causing the dentries to not be freed. Eric
Eric, Am 15.05.2017 um 21:45 schrieb Eric Biggers: >> If a directory is encrypted, evict() is not being called although the inode has no >> users anymore. >> It turned out evict() is not being called because fscrypt's fscrypt_d_revalidate() >> function. >> When I omit fscrypt_set_d_op() in UBIFS code, the test works just fine. > > Well, I assume you mean the t_encrypted_d_revalidate portion of the test. > generic/429 will still fail overall if you remove fscrypt_set_d_op() --- which > is expected, since it's testing dentry revalidation after all. Ah, yes. >> >> Can it be that fscrypt_d_revalidate() misses the case of i_nlink being 0? >> It seem to treat unlinked inodes as already gone and they stay. >> >> The following change makes the problem go away here: >> >> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c >> index 6d6eca394d4d..d0c19838e513 100644 >> --- a/fs/crypto/crypto.c >> +++ b/fs/crypto/crypto.c >> @@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page); >> static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) >> { >> struct dentry *dir; >> + struct inode *inode = d_inode(dentry); >> int dir_has_key, cached_with_key; >> >> if (flags & LOOKUP_RCU) >> @@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) >> (!cached_with_key && dir_has_key) || >> (cached_with_key && !dir_has_key)) >> return 0; >> + >> + if (!inode || inode->i_nlink == 0) >> + return 0; >> + >> return 1; >> } >> >> Does this change make sense? TBH, I'm not really an expert in this area and it is also >> not clear to me why you don't see these issue on ext4 or f2fs. >> Maybe UBIFS' limitations kick in much earlier. ;-) > > The test is repeatedly creating and removing a directory "dir" while lookups are > being done in it. It seems the problem is that many dentries are being created > for "dir", and they pin many different inodes, all at the same time. This > actually happens for ext4 too; it just doesn't cause an observable error. > > I doubt it's the right solution to make fscrypt_d_revalidate() look at > ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the > inode. I think there is probably a VFS bug that is causing the dentries to not > be freed. Not sure. Al? :-) Thanks, //richard
On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote: > > > > The test is repeatedly creating and removing a directory "dir" while lookups are > > being done in it. It seems the problem is that many dentries are being created > > for "dir", and they pin many different inodes, all at the same time. This > > actually happens for ext4 too; it just doesn't cause an observable error. > > > > I doubt it's the right solution to make fscrypt_d_revalidate() look at > > ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the > > inode. I think there is probably a VFS bug that is causing the dentries to not > > be freed. > > Not sure. Al? :-) > I can reproduce this on an unencrypted directory after updating path_init() in fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in ref-walk mode. So I think fscrypt_d_revalidate() was only relevant because it causes all path lookups to drop out of rcu-walk mode. It seems that what's happening is the "dir" dentries are not being freed because each one has a child dentry "file" that is a negative dentry. The "file" dentry would normally be freed by shrink_dcache_parent() called from vfs_rmdir(), but due to a race with stat("dir/file") the "file" dentry sometimes has nonzero reference count at that time, causing it to remain in the subdirs list. So we end up with a negative dentry "file" with 0 refcount and on the dentry LRU list, and its parent the positive dentry "dir" with 1 refcount. And the test program generates thousands of copies of that, with each "dir" referring to a different inode, and they get freed only when the shrinker runs. I'm not sure how to fix it... Eric
On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote: > > I doubt it's the right solution to make fscrypt_d_revalidate() look at > > ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the > > inode. I think there is probably a VFS bug that is causing the dentries to not > > be freed. > > Not sure. Al? :-) What's to tell VFS that they are garbage? If your code doesn't do it, VFS has no way to know. And I certainly agree that using ->d_revalidate() to trigger their unhashing is a bloody odd design; take it up with whoever had come up with it...
Eric, Am 16.05.2017 um 01:25 schrieb Eric Biggers: > On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote: >>> >>> The test is repeatedly creating and removing a directory "dir" while lookups are >>> being done in it. It seems the problem is that many dentries are being created >>> for "dir", and they pin many different inodes, all at the same time. This >>> actually happens for ext4 too; it just doesn't cause an observable error. >>> >>> I doubt it's the right solution to make fscrypt_d_revalidate() look at >>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the >>> inode. I think there is probably a VFS bug that is causing the dentries to not >>> be freed. >> >> Not sure. Al? :-) >> > > I can reproduce this on an unencrypted directory after updating path_init() in > fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in > ref-walk mode. So I think fscrypt_d_revalidate() was only relevant because it > causes all path lookups to drop out of rcu-walk mode. On ext4 or UBIFS? Thanks, //richard
Al, Am 16.05.2017 um 01:50 schrieb Al Viro: > On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote: > >>> I doubt it's the right solution to make fscrypt_d_revalidate() look at >>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the >>> inode. I think there is probably a VFS bug that is causing the dentries to not >>> be freed. >> >> Not sure. Al? :-) > > What's to tell VFS that they are garbage? If your code doesn't do it, > VFS has no way to know. And I certainly agree that using ->d_revalidate() > to trigger their unhashing is a bloody odd design; take it up with whoever > had come up with it... Not sure whether I understood your answer. What is the preferred way to inform VFS that the dentry is no longer valid? Thanks, //richard
On Tue, May 16, 2017 at 08:47:43AM +0200, Richard Weinberger wrote: > Eric, > > Am 16.05.2017 um 01:25 schrieb Eric Biggers: > > On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote: > >>> > >>> The test is repeatedly creating and removing a directory "dir" while lookups are > >>> being done in it. It seems the problem is that many dentries are being created > >>> for "dir", and they pin many different inodes, all at the same time. This > >>> actually happens for ext4 too; it just doesn't cause an observable error. > >>> > >>> I doubt it's the right solution to make fscrypt_d_revalidate() look at > >>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the > >>> inode. I think there is probably a VFS bug that is causing the dentries to not > >>> be freed. > >> > >> Not sure. Al? :-) > >> > > > > I can reproduce this on an unencrypted directory after updating path_init() in > > fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in > > ref-walk mode. So I think fscrypt_d_revalidate() was only relevant because it > > causes all path lookups to drop out of rcu-walk mode. > > On ext4 or UBIFS? > > Thanks, > //richard Both; the inode "leak" isn't filesystem-specific, beyond the fact that UBIFS apparently limits the number of inodes on its orphan list while ext4 does not. (How do I know it happens on ext4 then? /proc/slabinfo shows that lots of ext4 inodes have been allocated, and after an unclean shutdown and remounting, it's reported that 50,000+ orphan inodes were deleted.) There are lots of cases in which path lookups switch from rcu-walk mode into ref-walk mode, so the fact that it was being caused by ->d_revalidate() in this specific situation isn't really important, IMO. I think the actual fix would be something along the lines of making vfs_rmdir() unhash any negative child dentries, so that they get "killed" by dput() later. Eric
Eric, Am 17.05.2017 um 00:07 schrieb Eric Biggers: >>> I can reproduce this on an unencrypted directory after updating path_init() in >>> fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in >>> ref-walk mode. So I think fscrypt_d_revalidate() was only relevant because it >>> causes all path lookups to drop out of rcu-walk mode. >> >> On ext4 or UBIFS? > > Both; the inode "leak" isn't filesystem-specific, beyond the fact that UBIFS > apparently limits the number of inodes on its orphan list while ext4 does not. > (How do I know it happens on ext4 then? /proc/slabinfo shows that lots of ext4 > inodes have been allocated, and after an unclean shutdown and remounting, it's > reported that 50,000+ orphan inodes were deleted.) UBIFS's orphan list is limited by the size of a LEB (logical erase block). With nandsim's default settings such a LEB is rather small and can store only 2k orphans. > There are lots of cases in which path lookups switch from rcu-walk mode into > ref-walk mode, so the fact that it was being caused by ->d_revalidate() in this > specific situation isn't really important, IMO. > > I think the actual fix would be something along the lines of making vfs_rmdir() > unhash any negative child dentries, so that they get "killed" by dput() later. Thanks, //richard
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 6d6eca394d4d..d0c19838e513 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page); static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) { struct dentry *dir; + struct inode *inode = d_inode(dentry); int dir_has_key, cached_with_key; if (flags & LOOKUP_RCU) @@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) (!cached_with_key && dir_has_key) || (cached_with_key && !dir_has_key)) return 0; + + if (!inode || inode->i_nlink == 0) + return 0; + return 1; }