Message ID | 20200616202123.12656-1-msys.mizuma@gmail.com |
---|---|
State | New |
Headers | show |
Series | fs: i_version mntopt gets visible through /proc/mounts | expand |
On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > > /proc/mounts doesn't show 'i_version' even if iversion > mount option is set to XFS. > > iversion mount option is a VFS option, not ext4 specific option. > Move the handler to show_sb_opts() so that /proc/mounts can show > 'i_version' on not only ext4 but also the other filesystem. SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version mount option.
On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: > On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: > > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > > > > /proc/mounts doesn't show 'i_version' even if iversion > > mount option is set to XFS. > > > > iversion mount option is a VFS option, not ext4 specific option. > > Move the handler to show_sb_opts() so that /proc/mounts can show > > 'i_version' on not only ext4 but also the other filesystem. > > SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version > mount option. Thank you for pointing it out, I misunderstood that for XFS. iversion mount option is a VFS option, so I think it's good to show 'i_version' on /proc/mounts if the filesystem has i_version feature, like as: $ cat /proc/mounts ... /dev/vdb1 /mnt xfs rw,i_version,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 ... - Masa
On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: > On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: > > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > > > > /proc/mounts doesn't show 'i_version' even if iversion > > mount option is set to XFS. > > > > iversion mount option is a VFS option, not ext4 specific option. > > Move the handler to show_sb_opts() so that /proc/mounts can show > > 'i_version' on not only ext4 but also the other filesystem. > > SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version > mount option. It probably *should* be a kernel internal flag, but it seems to work as a mount option too. By coincidence I've just been looking at a bug report showing that i_version support is getting accidentally turned off on XFS whenever userspace does a read-write remount. Is there someplace in the xfs mount code where it should be throwing out SB_I_VERSION? Ideally there'd be entirely different fields for mount options and internal feature flags. But I don't know, maybe SB_I_VERSION is the only flag we have like this. --b.
On 6/17/20 10:58 AM, J. Bruce Fields wrote: > On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: >> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: >>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> >>> >>> /proc/mounts doesn't show 'i_version' even if iversion >>> mount option is set to XFS. >>> >>> iversion mount option is a VFS option, not ext4 specific option. >>> Move the handler to show_sb_opts() so that /proc/mounts can show >>> 'i_version' on not only ext4 but also the other filesystem. >> >> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version >> mount option. > > It probably *should* be a kernel internal flag, but it seems to work as > a mount option too. Not on XFS AFAICT: [600280.685810] xfs: Unknown parameter 'i_version' so we can't be exporting "mount options" for xfs that aren't actually going to be accepted by the filesystem. > By coincidence I've just been looking at a bug report showing that > i_version support is getting accidentally turned off on XFS whenever > userspace does a read-write remount. > > Is there someplace in the xfs mount code where it should be throwing out > SB_I_VERSION? <cc xfs list> XFS doesn't manipulate that flag on remount. We just turn it on by default for modern filesystem formats: /* version 5 superblocks support inode version counters. */ if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) sb->s_flags |= SB_I_VERSION; Also, this behavior doesn't seem unique to xfs: # mount -o loop,i_version fsfile test_iversion # grep test_iversion /proc/mounts /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0 # mount -o remount test_iversion # grep test_iversion /proc/mounts /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0 # uname -a Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux -Eric > Ideally there'd be entirely different fields for mount options and > internal feature flags. But I don't know, maybe SB_I_VERSION is the > only flag we have like this. > > --b. >
On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote: > > > On 6/17/20 10:58 AM, J. Bruce Fields wrote: > > On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: > >> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: > >>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > >>> > >>> /proc/mounts doesn't show 'i_version' even if iversion > >>> mount option is set to XFS. > >>> > >>> iversion mount option is a VFS option, not ext4 specific option. > >>> Move the handler to show_sb_opts() so that /proc/mounts can show > >>> 'i_version' on not only ext4 but also the other filesystem. > >> > >> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version > >> mount option. > > > > It probably *should* be a kernel internal flag, but it seems to work as > > a mount option too. > > Not on XFS AFAICT: > > [600280.685810] xfs: Unknown parameter 'i_version' Yeah, because the mount option is 'iversion', not 'i_version'. Even if you were going to expose the flag state in /proc/mounts, the text string should match the mount option. > so we can't be exporting "mount options" for xfs that aren't actually > going to be accepted by the filesystem. > > > By coincidence I've just been looking at a bug report showing that > > i_version support is getting accidentally turned off on XFS whenever > > userspace does a read-write remount. > > > > Is there someplace in the xfs mount code where it should be throwing out > > SB_I_VERSION? > > <cc xfs list> > > XFS doesn't manipulate that flag on remount. We just turn it on by default > for modern filesystem formats: > > /* version 5 superblocks support inode version counters. */ > if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > sb->s_flags |= SB_I_VERSION; > > Also, this behavior doesn't seem unique to xfs: > > # mount -o loop,i_version fsfile test_iversion > # grep test_iversion /proc/mounts > /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0 > # mount -o remount test_iversion > # grep test_iversion /proc/mounts > /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0 > # uname -a > Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux Probably because do_mount clears it and I guess xfs don't re-enable it in any of their remount functions... --D > -Eric > > > Ideally there'd be entirely different fields for mount options and > > internal feature flags. But I don't know, maybe SB_I_VERSION is the > > only flag we have like this. > > > > --b. > >
On 6/17/20 12:24 PM, Darrick J. Wong wrote: > On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote: >> >> >> On 6/17/20 10:58 AM, J. Bruce Fields wrote: >>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: >>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: >>>>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> >>>>> >>>>> /proc/mounts doesn't show 'i_version' even if iversion >>>>> mount option is set to XFS. >>>>> >>>>> iversion mount option is a VFS option, not ext4 specific option. >>>>> Move the handler to show_sb_opts() so that /proc/mounts can show >>>>> 'i_version' on not only ext4 but also the other filesystem. >>>> >>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version >>>> mount option. >>> >>> It probably *should* be a kernel internal flag, but it seems to work as >>> a mount option too. >> >> Not on XFS AFAICT: >> >> [600280.685810] xfs: Unknown parameter 'i_version' > > Yeah, because the mount option is 'iversion', not 'i_version'. Even if unless you're ext4: {Opt_i_version, "i_version"}, ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick) # strace -vv -emount mount -oloop,iversion fsfile mnt mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0 FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4: # strace -vv -emount mount -o remount mnt mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0 but it still looks unhandled on remount. Perhaps if /proc/mounts exposed "iversion" (not "i_version") then mount -o remount would DTRT. -Eric > you were going to expose the flag state in /proc/mounts, the text string > should match the mount option. > >> so we can't be exporting "mount options" for xfs that aren't actually >> going to be accepted by the filesystem. >> >>> By coincidence I've just been looking at a bug report showing that >>> i_version support is getting accidentally turned off on XFS whenever >>> userspace does a read-write remount. >>> >>> Is there someplace in the xfs mount code where it should be throwing out >>> SB_I_VERSION? >> >> <cc xfs list> >> >> XFS doesn't manipulate that flag on remount. We just turn it on by default >> for modern filesystem formats: >> >> /* version 5 superblocks support inode version counters. */ >> if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) >> sb->s_flags |= SB_I_VERSION; >> >> Also, this behavior doesn't seem unique to xfs: >> >> # mount -o loop,i_version fsfile test_iversion >> # grep test_iversion /proc/mounts >> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0 >> # mount -o remount test_iversion >> # grep test_iversion /proc/mounts >> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0 >> # uname -a >> Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux > > Probably because do_mount clears it and I guess xfs don't re-enable > it in any of their remount functions...
On Wed, Jun 17, 2020 at 12:55:24PM -0500, Eric Sandeen wrote: > On 6/17/20 12:24 PM, Darrick J. Wong wrote: > > On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote: > >> > >> > >> On 6/17/20 10:58 AM, J. Bruce Fields wrote: > >>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: > >>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: > >>>>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > >>>>> > >>>>> /proc/mounts doesn't show 'i_version' even if iversion > >>>>> mount option is set to XFS. > >>>>> > >>>>> iversion mount option is a VFS option, not ext4 specific option. > >>>>> Move the handler to show_sb_opts() so that /proc/mounts can show > >>>>> 'i_version' on not only ext4 but also the other filesystem. > >>>> > >>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version > >>>> mount option. > >>> > >>> It probably *should* be a kernel internal flag, but it seems to work as > >>> a mount option too. > >> > >> Not on XFS AFAICT: > >> > >> [600280.685810] xfs: Unknown parameter 'i_version' > > > > Yeah, because the mount option is 'iversion', not 'i_version'. Even if > > unless you're ext4: > > {Opt_i_version, "i_version"}, > > ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick) > > # strace -vv -emount mount -oloop,iversion fsfile mnt > mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0 > > FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4: > > # strace -vv -emount mount -o remount mnt > mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0 > > but it still looks unhandled on remount. Perhaps if /proc/mounts exposed > "iversion" (not "i_version") then mount -o remount would DTRT. I'd rather just eliminate the option, to the extent possible. It was only ever a mount option since it caused a performance regression in some filesystems, but I *think* that was addressed by Jeff Layton's work (f02a9ad1f15d "fs: handle inode->i_version more efficiently"). XFS in particular is just using this flag to tell knfsd that it should use i_version. I don't think it was really intended for userspace to be able to turn this off. --b.
On 6/17/20 1:18 PM, J. Bruce Fields wrote: > On Wed, Jun 17, 2020 at 12:55:24PM -0500, Eric Sandeen wrote: >> On 6/17/20 12:24 PM, Darrick J. Wong wrote: >>> On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote: >>>> >>>> >>>> On 6/17/20 10:58 AM, J. Bruce Fields wrote: >>>>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote: >>>>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote: >>>>>>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> >>>>>>> >>>>>>> /proc/mounts doesn't show 'i_version' even if iversion >>>>>>> mount option is set to XFS. >>>>>>> >>>>>>> iversion mount option is a VFS option, not ext4 specific option. >>>>>>> Move the handler to show_sb_opts() so that /proc/mounts can show >>>>>>> 'i_version' on not only ext4 but also the other filesystem. >>>>>> >>>>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version >>>>>> mount option. >>>>> >>>>> It probably *should* be a kernel internal flag, but it seems to work as >>>>> a mount option too. >>>> >>>> Not on XFS AFAICT: >>>> >>>> [600280.685810] xfs: Unknown parameter 'i_version' >>> >>> Yeah, because the mount option is 'iversion', not 'i_version'. Even if >> >> unless you're ext4: >> >> {Opt_i_version, "i_version"}, >> >> ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick) >> >> # strace -vv -emount mount -oloop,iversion fsfile mnt >> mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0 >> >> FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4: >> >> # strace -vv -emount mount -o remount mnt >> mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0 >> >> but it still looks unhandled on remount. Perhaps if /proc/mounts exposed >> "iversion" (not "i_version") then mount -o remount would DTRT. > > I'd rather just eliminate the option, to the extent possible. > > It was only ever a mount option since it caused a performance regression > in some filesystems, but I *think* that was addressed by Jeff Layton's > work (f02a9ad1f15d "fs: handle inode->i_version more efficiently"). > > XFS in particular is just using this flag to tell knfsd that it should > use i_version. I don't think it was really intended for userspace to be > able to turn this off. but mount(8) has already exposed this interface: iversion Every time the inode is modified, the i_version field will be incremented. noiversion Do not increment the i_version inode field. so now what? FWIW, exporting "iversion" in /proc/mounts will 1) tell us whether the SB_I_VERSION is or is not in fact set on the fs, and 2) will make remount DTRT because mount(8) will properly parse it and send it back in via the "flags" var during remount. # mount -o loop fsfile mnt # grep mnt /proc/mounts /dev/loop0 /tmp/mnt xfs rw,iversion,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 # strace -vv -emount mount -o remount mnt mount("/dev/loop0", "/tmp/mnt", 0x55c11d0e3150, MS_REMOUNT|MS_RELATIME|MS_I_VERSION, "seclabel,attr2,inode64,logbufs=8"...) = 0 # grep mnt /proc/mounts /dev/loop0 /tmp/mnt xfs rw,iversion,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 ext4 i_version will just map back to iversion: # mount -o i_version,loop fsfile mnt # grep mnt /proc/mounts /dev/loop0 /tmp/mnt ext4 rw,iversion,seclabel,relatime 0 0 # mount -o remount mnt # grep mnt /proc/mounts /dev/loop0 /tmp/mnt ext4 rw,iversion,seclabel,relatime 0 0 One wrinkle is that xfs will not honor "noiversion" currently but that's a different question. -Eric
On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > but mount(8) has already exposed this interface: > > iversion > Every time the inode is modified, the i_version field will be incremented. > > noiversion > Do not increment the i_version inode field. > > so now what? It's not like anyone's actually depending on i_version *not* being incremented. (Can you even observe it from userspace other than over NFS?) So, just silently turn on the "iversion" behavior and ignore noiversion, and I doubt you're going to break any real application. --b.
On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > but mount(8) has already exposed this interface: > > > > iversion > > Every time the inode is modified, the i_version field will be incremented. > > > > noiversion > > Do not increment the i_version inode field. > > > > so now what? > > It's not like anyone's actually depending on i_version *not* being > incremented. (Can you even observe it from userspace other than over > NFS?) > > So, just silently turn on the "iversion" behavior and ignore noiversion, > and I doubt you're going to break any real application. I suppose it's probably good to remain the options for user compatibility, however, it seems that iversion and noiversiont are useful for only ext4. How about moving iversion and noiversion description on mount(8) to ext4 specific option? And fixing the remount issue for XFS (maybe btrfs has the same issue as well)? For XFS like as: diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 379cbff438bc..2ddd634cfb0b 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( return error; } + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) + mp->m_super->s_flags |= SB_I_VERSION; + return 0; } Thanks, Masa
On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote: > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > > but mount(8) has already exposed this interface: > > > > > > iversion > > > Every time the inode is modified, the i_version field will be incremented. > > > > > > noiversion > > > Do not increment the i_version inode field. > > > > > > so now what? > > > > It's not like anyone's actually depending on i_version *not* being > > incremented. (Can you even observe it from userspace other than over > > NFS?) > > > > So, just silently turn on the "iversion" behavior and ignore noiversion, > > and I doubt you're going to break any real application. > > I suppose it's probably good to remain the options for user compatibility, > however, it seems that iversion and noiversiont are useful for > only ext4. > How about moving iversion and noiversion description on mount(8) > to ext4 specific option? > > And fixing the remount issue for XFS (maybe btrfs has the same > issue as well)? > For XFS like as: > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 379cbff438bc..2ddd634cfb0b 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( > return error; > } > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > + mp->m_super->s_flags |= SB_I_VERSION; > + I wonder, does this have to be done at the top of this function because the vfs already removed S_I_VERSION from s_flags? --D > return 0; > } > > Thanks, > Masa
On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote: > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > > but mount(8) has already exposed this interface: > > > > > > iversion > > > Every time the inode is modified, the i_version field will be incremented. > > > > > > noiversion > > > Do not increment the i_version inode field. > > > > > > so now what? > > > > It's not like anyone's actually depending on i_version *not* being > > incremented. (Can you even observe it from userspace other than over > > NFS?) > > > > So, just silently turn on the "iversion" behavior and ignore noiversion, > > and I doubt you're going to break any real application. > > I suppose it's probably good to remain the options for user compatibility, > however, it seems that iversion and noiversiont are useful for > only ext4. > How about moving iversion and noiversion description on mount(8) > to ext4 specific option? > > And fixing the remount issue for XFS (maybe btrfs has the same > issue as well)? > For XFS like as: > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 379cbff438bc..2ddd634cfb0b 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( > return error; > } > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > + mp->m_super->s_flags |= SB_I_VERSION; > + > return 0; > } no this doesn't work, because the sueprblock flags are modified after ->reconfigure is called. i.e. reconfigure_super() does this: if (fc->ops->reconfigure) { retval = fc->ops->reconfigure(fc); if (retval) { if (!force) goto cancel_readonly; /* If forced remount, go ahead despite any errors */ WARN(1, "forced remount of a %s fs returned %i\n", sb->s_type->name, retval); } } WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | (fc->sb_flags & fc->sb_flags_mask))); And it's the WRITE_ONCE() line that clears SB_I_VERSION out of sb->s_flags. Hence adding it in ->reconfigure doesn't help. What we actually want to do here in xfs_fc_reconfigure() is this: if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) fc->sb_flags_mask |= SB_I_VERSION; So that the SB_I_VERSION is not cleared from sb->s_flags. I'll also note that btrfs will need the same fix, because it also sets SB_I_VERSION unconditionally, as will any other filesystem that does this, too. Really, this is just indicative of the mess that the mount flags vs superblock feature flags are. Filesystems can choose to unconditionally support various superblock features, and no mount option futzing from userspace should -ever- be able to change that feature. Filesystems really do need to be able to override mount options that were parsed in userspace and turned into a binary flag... Cheers, Dave.
On Wed, Jun 17, 2020 at 06:44:29PM -0700, Darrick J. Wong wrote: > On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote: > > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > > > but mount(8) has already exposed this interface: > > > > > > > > iversion > > > > Every time the inode is modified, the i_version field will be incremented. > > > > > > > > noiversion > > > > Do not increment the i_version inode field. > > > > > > > > so now what? > > > > > > It's not like anyone's actually depending on i_version *not* being > > > incremented. (Can you even observe it from userspace other than over > > > NFS?) > > > > > > So, just silently turn on the "iversion" behavior and ignore noiversion, > > > and I doubt you're going to break any real application. > > > > I suppose it's probably good to remain the options for user compatibility, > > however, it seems that iversion and noiversiont are useful for > > only ext4. > > How about moving iversion and noiversion description on mount(8) > > to ext4 specific option? > > > > And fixing the remount issue for XFS (maybe btrfs has the same > > issue as well)? > > For XFS like as: > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 379cbff438bc..2ddd634cfb0b 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( > > return error; > > } > > > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > > + mp->m_super->s_flags |= SB_I_VERSION; > > + > > I wonder, does this have to be done at the top of this function because > the vfs already removed S_I_VERSION from s_flags? Ah, I found the above code doesn't work... sb->s_flags will be updated after reconfigure(): int reconfigure_super(struct fs_context *fc) { ... if (fc->ops->reconfigure) { retval = fc->ops->reconfigure(fc); if (retval) { if (!force) goto cancel_readonly; /* If forced remount, go ahead despite any errors */ WARN(1, "forced remount of a %s fs returned %i\n", sb->s_type->name, retval); } } WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | (fc->sb_flags & fc->sb_flags_mask))); Here, fc->sb_flags_mask should be MS_RMT_MASK, so SB_I_VERSION will be dropped except iversion mount opstion (MS_I_VERSION) is set. - Masa
On Thu, Jun 18, 2020 at 01:05:39PM +1000, Dave Chinner wrote: > On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote: > > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > > > but mount(8) has already exposed this interface: > > > > > > > > iversion > > > > Every time the inode is modified, the i_version field will be incremented. > > > > > > > > noiversion > > > > Do not increment the i_version inode field. > > > > > > > > so now what? > > > > > > It's not like anyone's actually depending on i_version *not* being > > > incremented. (Can you even observe it from userspace other than over > > > NFS?) > > > > > > So, just silently turn on the "iversion" behavior and ignore noiversion, > > > and I doubt you're going to break any real application. > > > > I suppose it's probably good to remain the options for user compatibility, > > however, it seems that iversion and noiversiont are useful for > > only ext4. > > How about moving iversion and noiversion description on mount(8) > > to ext4 specific option? > > > > And fixing the remount issue for XFS (maybe btrfs has the same > > issue as well)? > > For XFS like as: > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 379cbff438bc..2ddd634cfb0b 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( > > return error; > > } > > > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > > + mp->m_super->s_flags |= SB_I_VERSION; > > + > > return 0; > > } > > no this doesn't work, because the sueprblock flags are modified > after ->reconfigure is called. > > i.e. reconfigure_super() does this: > > if (fc->ops->reconfigure) { > retval = fc->ops->reconfigure(fc); > if (retval) { > if (!force) > goto cancel_readonly; > /* If forced remount, go ahead despite any errors */ > WARN(1, "forced remount of a %s fs returned %i\n", > sb->s_type->name, retval); > } > } > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > (fc->sb_flags & fc->sb_flags_mask))); > > And it's the WRITE_ONCE() line that clears SB_I_VERSION out of > sb->s_flags. Hence adding it in ->reconfigure doesn't help. > > What we actually want to do here in xfs_fc_reconfigure() is this: > > if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > fc->sb_flags_mask |= SB_I_VERSION; > > So that the SB_I_VERSION is not cleared from sb->s_flags. > > I'll also note that btrfs will need the same fix, because it also > sets SB_I_VERSION unconditionally, as will any other filesystem that > does this, too. Thank you for pointed it out. How about following change? I believe it works both xfs and btrfs... diff --git a/fs/super.c b/fs/super.c index b0a511bef4a0..42fc6334d384 100644 --- a/fs/super.c +++ b/fs/super.c @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) } } + if (sb->s_flags & SB_I_VERSION) + fc->sb_flags |= MS_I_VERSION; + WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | (fc->sb_flags & fc->sb_flags_mask))); /* Needs to be ordered wrt mnt_is_readonly() */ - Masa > > Really, this is just indicative of the mess that the mount > flags vs superblock feature flags are. Filesystems can choose to > unconditionally support various superblock features, and no mount > option futzing from userspace should -ever- be able to change that > feature. Filesystems really do need to be able to override mount > options that were parsed in userspace and turned into a binary > flag... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote: > On Thu, Jun 18, 2020 at 01:05:39PM +1000, Dave Chinner wrote: > > On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote: > > > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote: > > > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote: > > > > > but mount(8) has already exposed this interface: > > > > > > > > > > iversion > > > > > Every time the inode is modified, the i_version field will be incremented. > > > > > > > > > > noiversion > > > > > Do not increment the i_version inode field. > > > > > > > > > > so now what? > > > > > > > > It's not like anyone's actually depending on i_version *not* being > > > > incremented. (Can you even observe it from userspace other than over > > > > NFS?) > > > > > > > > So, just silently turn on the "iversion" behavior and ignore noiversion, > > > > and I doubt you're going to break any real application. > > > > > > I suppose it's probably good to remain the options for user compatibility, > > > however, it seems that iversion and noiversiont are useful for > > > only ext4. > > > How about moving iversion and noiversion description on mount(8) > > > to ext4 specific option? > > > > > > And fixing the remount issue for XFS (maybe btrfs has the same > > > issue as well)? > > > For XFS like as: > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > index 379cbff438bc..2ddd634cfb0b 100644 > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure( > > > return error; > > > } > > > > > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > > > + mp->m_super->s_flags |= SB_I_VERSION; > > > + > > > return 0; > > > } > > > > no this doesn't work, because the sueprblock flags are modified > > after ->reconfigure is called. > > > > i.e. reconfigure_super() does this: > > > > if (fc->ops->reconfigure) { > > retval = fc->ops->reconfigure(fc); > > if (retval) { > > if (!force) > > goto cancel_readonly; > > /* If forced remount, go ahead despite any errors */ > > WARN(1, "forced remount of a %s fs returned %i\n", > > sb->s_type->name, retval); > > } > > } > > > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > > (fc->sb_flags & fc->sb_flags_mask))); > > > > And it's the WRITE_ONCE() line that clears SB_I_VERSION out of > > sb->s_flags. Hence adding it in ->reconfigure doesn't help. > > > > What we actually want to do here in xfs_fc_reconfigure() is this: > > > > if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) > > fc->sb_flags_mask |= SB_I_VERSION; > > > > So that the SB_I_VERSION is not cleared from sb->s_flags. > > > > I'll also note that btrfs will need the same fix, because it also > > sets SB_I_VERSION unconditionally, as will any other filesystem that > > does this, too. > > Thank you for pointed it out. > How about following change? I believe it works both xfs and btrfs... > > diff --git a/fs/super.c b/fs/super.c > index b0a511bef4a0..42fc6334d384 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) > } > } > > + if (sb->s_flags & SB_I_VERSION) > + fc->sb_flags |= MS_I_VERSION; > + > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > (fc->sb_flags & fc->sb_flags_mask))); > /* Needs to be ordered wrt mnt_is_readonly() */ This will prevent SB_I_VERSION from being turned off at all. That will break existing filesystems that allow SB_I_VERSION to be turned off on remount, such as ext4. The manipulations here need to be in the filesystem specific code; we screwed this one up so badly there is no "one size fits all" behaviour that we can implement in the generic code... Cheers, Dave.
On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote: > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote: > > Thank you for pointed it out. > > How about following change? I believe it works both xfs and btrfs... > > > > diff --git a/fs/super.c b/fs/super.c > > index b0a511bef4a0..42fc6334d384 100644 > > --- a/fs/super.c > > +++ b/fs/super.c > > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) > > } > > } > > > > + if (sb->s_flags & SB_I_VERSION) > > + fc->sb_flags |= MS_I_VERSION; > > + > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > > (fc->sb_flags & fc->sb_flags_mask))); > > /* Needs to be ordered wrt mnt_is_readonly() */ > > This will prevent SB_I_VERSION from being turned off at all. That > will break existing filesystems that allow SB_I_VERSION to be turned > off on remount, such as ext4. > > The manipulations here need to be in the filesystem specific code; > we screwed this one up so badly there is no "one size fits all" > behaviour that we can implement in the generic code... My memory was that after Jeff Layton's i_version patches, there wasn't really a significant performance hit any more, so the ability to turn it off is no longer useful. But looking back through Jeff's postings, I don't see him claiming that; e.g. in: https://lore.kernel.org/lkml/20171222120556.7435-1-jlayton@kernel.org/ https://lore.kernel.org/linux-nfs/20180109141059.25929-1-jlayton@kernel.org/ https://lore.kernel.org/linux-nfs/1517228795.5965.24.camel@redhat.com/ he reports comparing old iversion behavior to new iversion behavior, but not new iversion behavior to new noiversion behavior. --b.
On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote: > > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote: > > > Thank you for pointed it out. > > > How about following change? I believe it works both xfs and btrfs... > > > > > > diff --git a/fs/super.c b/fs/super.c > > > index b0a511bef4a0..42fc6334d384 100644 > > > --- a/fs/super.c > > > +++ b/fs/super.c > > > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) > > > } > > > } > > > > > > + if (sb->s_flags & SB_I_VERSION) > > > + fc->sb_flags |= MS_I_VERSION; > > > + > > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > > > (fc->sb_flags & fc->sb_flags_mask))); > > > /* Needs to be ordered wrt mnt_is_readonly() */ > > > > This will prevent SB_I_VERSION from being turned off at all. That > > will break existing filesystems that allow SB_I_VERSION to be turned > > off on remount, such as ext4. > > > > The manipulations here need to be in the filesystem specific code; > > we screwed this one up so badly there is no "one size fits all" > > behaviour that we can implement in the generic code... > > My memory was that after Jeff Layton's i_version patches, there wasn't > really a significant performance hit any more, so the ability to turn it > off is no longer useful. Yes, I completely agree with you here. However, with some filesystems allowing it to be turned off, we can't just wave our hands and force enable the option. Those filesystems - if the maintainers chose to always enable iversion - will have to go through a mount option deprecation period before permanently enabling it. > But looking back through Jeff's postings, I don't see him claiming that; > e.g. in: > > https://lore.kernel.org/lkml/20171222120556.7435-1-jlayton@kernel.org/ > https://lore.kernel.org/linux-nfs/20180109141059.25929-1-jlayton@kernel.org/ > https://lore.kernel.org/linux-nfs/1517228795.5965.24.camel@redhat.com/ > > he reports comparing old iversion behavior to new iversion behavior, but > not new iversion behavior to new noiversion behavior. Yeah, it's had to compare noiversion behaviour on filesystems where it was understood that it couldn't actually be turned off. And, realistically, the comaprison to noiversion wasn't really relevant to the problem Jeff's patchset was addressing... Cheers, Dave.
On Fri, 2020-06-19 at 12:44 +1000, Dave Chinner wrote: > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote: > > > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote: > > > > Thank you for pointed it out. > > > > How about following change? I believe it works both xfs and btrfs... > > > > > > > > diff --git a/fs/super.c b/fs/super.c > > > > index b0a511bef4a0..42fc6334d384 100644 > > > > --- a/fs/super.c > > > > +++ b/fs/super.c > > > > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) > > > > } > > > > } > > > > > > > > + if (sb->s_flags & SB_I_VERSION) > > > > + fc->sb_flags |= MS_I_VERSION; > > > > + > > > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | > > > > (fc->sb_flags & fc->sb_flags_mask))); > > > > /* Needs to be ordered wrt mnt_is_readonly() */ > > > > > > This will prevent SB_I_VERSION from being turned off at all. That > > > will break existing filesystems that allow SB_I_VERSION to be turned > > > off on remount, such as ext4. > > > > > > The manipulations here need to be in the filesystem specific code; > > > we screwed this one up so badly there is no "one size fits all" > > > behaviour that we can implement in the generic code... > > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > really a significant performance hit any more, so the ability to turn it > > off is no longer useful. > > Yes, I completely agree with you here. However, with some > filesystems allowing it to be turned off, we can't just wave our > hands and force enable the option. Those filesystems - if the > maintainers chose to always enable iversion - will have to go > through a mount option deprecation period before permanently > enabling it. > > > But looking back through Jeff's postings, I don't see him claiming that; > > e.g. in: > > > > https://lore.kernel.org/lkml/20171222120556.7435-1-jlayton@kernel.org/ > > https://lore.kernel.org/linux-nfs/20180109141059.25929-1-jlayton@kernel.org/ > > https://lore.kernel.org/linux-nfs/1517228795.5965.24.camel@redhat.com/ > > > > he reports comparing old iversion behavior to new iversion behavior, but > > not new iversion behavior to new noiversion behavior. > > Yeah, it's had to compare noiversion behaviour on filesystems where > it was understood that it couldn't actually be turned off. And, > realistically, the comaprison to noiversion wasn't really relevant > to the problem Jeff's patchset was addressing... > I actually did do some comparison with that patchset vs. noiversion mounted ext4, and found that there was a small performance delta. It wasn't much but it was measurable enough that I didn't want to propose removing the option from ext4 altogether at the time. It may be worth it to do that now though.
On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote: > This will prevent SB_I_VERSION from being turned off at all. That > will break existing filesystems that allow SB_I_VERSION to be turned > off on remount, such as ext4. > > The manipulations here need to be in the filesystem specific code; > we screwed this one up so badly there is no "one size fits all" > behaviour that we can implement in the generic code... Yes. SB_I_VERSION should never be set by common code.
On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > My memory was that after Jeff Layton's i_version patches, there wasn't > > really a significant performance hit any more, so the ability to turn it > > off is no longer useful. > > Yes, I completely agree with you here. However, with some > filesystems allowing it to be turned off, we can't just wave our > hands and force enable the option. Those filesystems - if the > maintainers chose to always enable iversion - will have to go > through a mount option deprecation period before permanently > enabling it. I don't understand why. The filesystem can continue to let people set iversion or noiversion as they like, while under the covers behaving as if iversion is always set. I can't see how that would break any application. (Or even how an application would be able to detect that the filesystem was doing this.) --b. > > > But looking back through Jeff's postings, I don't see him claiming that; > > e.g. in: > > > > https://lore.kernel.org/lkml/20171222120556.7435-1-jlayton@kernel.org/ > > https://lore.kernel.org/linux-nfs/20180109141059.25929-1-jlayton@kernel.org/ > > https://lore.kernel.org/linux-nfs/1517228795.5965.24.camel@redhat.com/ > > > > he reports comparing old iversion behavior to new iversion behavior, but > > not new iversion behavior to new noiversion behavior. > > Yeah, it's had to compare noiversion behaviour on filesystems where > it was understood that it couldn't actually be turned off. And, > realistically, the comaprison to noiversion wasn't really relevant > to the problem Jeff's patchset was addressing... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote: > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > > really a significant performance hit any more, so the ability to turn it > > > off is no longer useful. > > > > Yes, I completely agree with you here. However, with some > > filesystems allowing it to be turned off, we can't just wave our > > hands and force enable the option. Those filesystems - if the > > maintainers chose to always enable iversion - will have to go > > through a mount option deprecation period before permanently > > enabling it. > > I don't understand why. > > The filesystem can continue to let people set iversion or noiversion as > they like, while under the covers behaving as if iversion is always set. > I can't see how that would break any application. (Or even how an > application would be able to detect that the filesystem was doing this.) It doesn't break functionality, but it affects performance. IOWs, it can make certain workloads go a lot slower in some circumstances. And that can result in unexectedly breaking SLAs or slow down a complex, finely tuned data center wide workload to the point it no longer meets requirements. Such changes in behaviour are considered a regression, especially if they result from a change that just ignores the mount option that turned off that specific feature. Cheers, Dave.
On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote: > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote: > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > > > really a significant performance hit any more, so the ability to turn it > > > > off is no longer useful. > > > > > > Yes, I completely agree with you here. However, with some > > > filesystems allowing it to be turned off, we can't just wave our > > > hands and force enable the option. Those filesystems - if the > > > maintainers chose to always enable iversion - will have to go > > > through a mount option deprecation period before permanently > > > enabling it. > > > > I don't understand why. > > > > The filesystem can continue to let people set iversion or noiversion as > > they like, while under the covers behaving as if iversion is always set. > > I can't see how that would break any application. (Or even how an > > application would be able to detect that the filesystem was doing this.) > > It doesn't break functionality, but it affects performance. I thought you just agreed above that any performance hit was not "significant". > IOWs, it can make certain workloads go a lot slower in some > circumstances. And that can result in unexectedly breaking SLAs or > slow down a complex, finely tuned data center wide workload to the > point it no longer meets requirements. Such changes in behaviour are > considered a regression, especially if they result from a change that > just ignores the mount option that turned off that specific feature. I get that, but, what's the threshhold here for a significant risk of regression? The "noiversion" behavior is kinda painful for NFS. --b.
On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote: > On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote: > > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote: > > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > > > > really a significant performance hit any more, so the ability to turn it > > > > > off is no longer useful. > > > > > > > > Yes, I completely agree with you here. However, with some > > > > filesystems allowing it to be turned off, we can't just wave our > > > > hands and force enable the option. Those filesystems - if the > > > > maintainers chose to always enable iversion - will have to go > > > > through a mount option deprecation period before permanently > > > > enabling it. > > > > > > I don't understand why. > > > > > > The filesystem can continue to let people set iversion or noiversion as > > > they like, while under the covers behaving as if iversion is always set. > > > I can't see how that would break any application. (Or even how an > > > application would be able to detect that the filesystem was doing this.) > > > > It doesn't break functionality, but it affects performance. > > I thought you just agreed above that any performance hit was not > "significant". Yes, but that's just /my opinion/. However, other people have different opinions on this matter (and we know that from the people who considered XFS v4 -> v5 going slower because iversion a major regression), and so we must acknowledge those opinions even if we don't agree with them. That is, if people are using noiversion for performance reasons on filesystems that are not designed/intended to have it permanently enabled, then yanking that functionality out from under them without warning is a Bad Thing To Do. If we want to change this behaviour, the mount option needs to be deprecated and a date/kernel release placed on when it will no longer function (at least a year in the future). When the mount option is used, it needs to log a deprecation warning to the syslog so that users are informed that the option is going away. Then when the deprecation date passes, the mount option can then be ignored by the kernel. > > IOWs, it can make certain workloads go a lot slower in some > > circumstances. And that can result in unexectedly breaking SLAs or > > slow down a complex, finely tuned data center wide workload to the > > point it no longer meets requirements. Such changes in behaviour are > > considered a regression, especially if they result from a change that > > just ignores the mount option that turned off that specific feature. > > I get that, but, what's the threshhold here for a significant risk of > regression? <shrug> I have no real idea because the filesystems that are affected are not ones I'm actively involved in supporting or developing for. That's for the people with domain specific expertise - the individual filesystem maintainers - to decide. > The "noiversion" behavior is kinda painful for NFS. Sure, but that's all you ever get on XFS v4 filesystems and any other filesystem that doesn't support persistent iversion storage. So the NFS implemenations are going to have to live with filesystems without iversion support at all for many more years to come. Cheers, Dave.
On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote: > > On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote: > > > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote: > > > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > > > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > > > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > > > > > really a significant performance hit any more, so the ability to turn it > > > > > > off is no longer useful. > > > > > > > > > > Yes, I completely agree with you here. However, with some > > > > > filesystems allowing it to be turned off, we can't just wave our > > > > > hands and force enable the option. Those filesystems - if the > > > > > maintainers chose to always enable iversion - will have to go > > > > > through a mount option deprecation period before permanently > > > > > enabling it. > > > > > > > > I don't understand why. > > > > > > > > The filesystem can continue to let people set iversion or noiversion as > > > > they like, while under the covers behaving as if iversion is always set. > > > > I can't see how that would break any application. (Or even how an > > > > application would be able to detect that the filesystem was doing this.) > > > > > > It doesn't break functionality, but it affects performance. > > > > I thought you just agreed above that any performance hit was not > > "significant". > > Yes, but that's just /my opinion/. > > However, other people have different opinions on this matter (and we > know that from the people who considered XFS v4 -> v5 going slower > because iversion a major regression), and so we must acknowledge > those opinions even if we don't agree with them. Do you have any of those reports handy? Were there numbers? --b.
On 6/19/20 8:56 PM, J. Bruce Fields wrote: > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: ... >> However, other people have different opinions on this matter (and we >> know that from the people who considered XFS v4 -> v5 going slower >> because iversion a major regression), and so we must acknowledge >> those opinions even if we don't agree with them. > > Do you have any of those reports handy? Were there numbers? I can't answer that but did a little digging. MS_I_VERSION as an option appeared here: commit 7a224228ed79d587ece2304869000aad1b8e97dd Author: Jean Noel Cordenner <jean-noel.cordenner@bull.net> Date: Mon Jan 28 23:58:27 2008 -0500 vfs: Add 64 bit i_version support The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> and ext4's Opt_i_version mount option appeared here: commit 25ec56b518257a56d2ff41a941d288e4b5ff9488 Author: Jean Noel Cordenner <jean-noel.cordenner@bull.net> Date: Mon Jan 28 23:58:27 2008 -0500 ext4: Add inode version support in ext4 This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net> so the optional enablement was there on day one, without any real explanation of why. -Eric
On Sat, Jun 20, 2020 at 12:00:43PM -0500, Eric Sandeen wrote: > On 6/19/20 8:56 PM, J. Bruce Fields wrote: > > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > > ... > > >> However, other people have different opinions on this matter (and we > >> know that from the people who considered XFS v4 -> v5 going slower > >> because iversion a major regression), and so we must acknowledge > >> those opinions even if we don't agree with them. > > > > Do you have any of those reports handy? Were there numbers? > > I can't answer that but did a little digging. MS_I_VERSION as an option > appeared here: > ... > so the optional enablement was there on day one, without any real explanation > of why. My memory is that they didn't have measurements at first, but worried that there might be a performance issue. Which later mesurements confirmed. But that Jeff Layton's work eliminated most of that. I think ext4 was the focuse of the concern, but xfs might also have had a (less serious) regression, and btrfs might have actually had it worst? But I don't have references and my memory may be wrong. --b.
On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote: > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > > On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote: > > > On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote: > > > > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote: > > > > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote: > > > > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote: > > > > > > > My memory was that after Jeff Layton's i_version patches, there wasn't > > > > > > > really a significant performance hit any more, so the ability to turn it > > > > > > > off is no longer useful. > > > > > > > > > > > > Yes, I completely agree with you here. However, with some > > > > > > filesystems allowing it to be turned off, we can't just wave our > > > > > > hands and force enable the option. Those filesystems - if the > > > > > > maintainers chose to always enable iversion - will have to go > > > > > > through a mount option deprecation period before permanently > > > > > > enabling it. > > > > > > > > > > I don't understand why. > > > > > > > > > > The filesystem can continue to let people set iversion or noiversion as > > > > > they like, while under the covers behaving as if iversion is always set. > > > > > I can't see how that would break any application. (Or even how an > > > > > application would be able to detect that the filesystem was doing this.) > > > > > > > > It doesn't break functionality, but it affects performance. > > > > > > I thought you just agreed above that any performance hit was not > > > "significant". > > > > Yes, but that's just /my opinion/. > > > > However, other people have different opinions on this matter (and we > > know that from the people who considered XFS v4 -> v5 going slower > > because iversion a major regression), and so we must acknowledge > > those opinions even if we don't agree with them. > > Do you have any of those reports handy? Were there numbers? e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7. Numbers were 40-47% performance degradation for in-cache writes caused by the original IVERSION implementation using iozone. There were others I recall, all realted to similar high-IOP small random writes workloads typical of databases.... Cheers, Dave.
On Sun, Jun 21, 2020 at 09:54:08AM +1000, Dave Chinner wrote: > On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote: > > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > > > However, other people have different opinions on this matter (and we > > > know that from the people who considered XFS v4 -> v5 going slower > > > because iversion a major regression), and so we must acknowledge > > > those opinions even if we don't agree with them. > > > > Do you have any of those reports handy? Were there numbers? > > e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7. > Numbers were 40-47% performance degradation for in-cache writes > caused by the original IVERSION implementation using iozone. There > were others I recall, all realted to similar high-IOP small random > writes workloads typical of databases.... Thanks, that's an interesting bug! Though a bit tangled. This is where you identified the change attribute as the main culprit: https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c42 The test was running at 70,000 writes/s (2.2GB/s), so it was one transaction per write() syscall: timestamp updates. On CRC enabled filesystems, we have a change counter for NFSv4 - it gets incremented on every write() syscall, even when the timestamp doesn't change. That's the difference in behaviour and hence performance in this test. In RHEL8, or anything post-v4.16, the frequency of change attribute updates should be back down to that of timestamp updates on this workload. So it'd be interesting to repeat that experiment now. The bug was reporting in-house testing, and doesn't show any evidence that particular regression was encountered by users; Eric said: https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c52 Root cause of this minor in-memory regression was inode versioning behavior; as it's unlikely to have real-world effects (and has been open for years with no customer complaints) I'm closing this WONTFIX to get it off the radar. The typical user may just skip an upgrade or otherwise work around the problem rather than root-causing it like this, so absence of reports isn't conclusive. I understand wanting to err on the side of caution. But if that regression's mostly addressed now, then I'm still inclined to think it's time to just leave this on everywhere. --b.
On Mon, Jun 22, 2020 at 05:26:12PM -0400, J. Bruce Fields wrote: > On Sun, Jun 21, 2020 at 09:54:08AM +1000, Dave Chinner wrote: > > On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote: > > > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote: > > > > However, other people have different opinions on this matter (and we > > > > know that from the people who considered XFS v4 -> v5 going slower > > > > because iversion a major regression), and so we must acknowledge > > > > those opinions even if we don't agree with them. > > > > > > Do you have any of those reports handy? Were there numbers? > > > > e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7. > > Numbers were 40-47% performance degradation for in-cache writes > > caused by the original IVERSION implementation using iozone. There > > were others I recall, all realted to similar high-IOP small random > > writes workloads typical of databases.... > > Thanks, that's an interesting bug! Though a bit tangled. This is where > you identified the change attribute as the main culprit: > > https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c42 > > The test was running at 70,000 writes/s (2.2GB/s), so it was one > transaction per write() syscall: timestamp updates. On CRC > enabled filesystems, we have a change counter for NFSv4 - it > gets incremented on every write() syscall, even when the > timestamp doesn't change. That's the difference in behaviour and > hence performance in this test. > > In RHEL8, or anything post-v4.16, the frequency of change attribute > updates should be back down to that of timestamp updates on this > workload. So it'd be interesting to repeat that experiment now. Yup, which in itself has been a problem for similar workloads. There's a reason we now recommend the use of lazytime for high performance database workloads that can do hundreds of thousands of small write IOs a second... > The bug was reporting in-house testing, and doesn't show any evidence > that particular regression was encountered by users; Eric said: > > https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c52 > > Root cause of this minor in-memory regression was inode > versioning behavior; as it's unlikely to have real-world effects > (and has been open for years with no customer complaints) I'm > closing this WONTFIX to get it off the radar. It's just the first I found because bugzilla has a slow, less than useful search engine. We know that real applications have hit this, and we know even the overhead of timestamp updates on writes is way too high for them. > The typical user may just skip an upgrade or otherwise work around the > problem rather than root-causing it like this, so absence of reports > isn't conclusive. I understand wanting to err on the side of caution. Yup, it's a generic problem - just because we've worked around or mitigated the most common situations it impacts performance, that doesn't mean they work for everyone.... Cheers, Dave.
On 6/18/20 3:39 PM, Dave Chinner wrote: > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote: ... >> Thank you for pointed it out. >> How about following change? I believe it works both xfs and btrfs... >> >> diff --git a/fs/super.c b/fs/super.c >> index b0a511bef4a0..42fc6334d384 100644 >> --- a/fs/super.c >> +++ b/fs/super.c >> @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc) >> } >> } >> >> + if (sb->s_flags & SB_I_VERSION) >> + fc->sb_flags |= MS_I_VERSION; >> + >> WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | >> (fc->sb_flags & fc->sb_flags_mask))); >> /* Needs to be ordered wrt mnt_is_readonly() */ > > This will prevent SB_I_VERSION from being turned off at all. That > will break existing filesystems that allow SB_I_VERSION to be turned > off on remount, such as ext4. > > The manipulations here need to be in the filesystem specific code; > we screwed this one up so badly there is no "one size fits all" > behaviour that we can implement in the generic code... I wandered back into this thread for some reason ... ;) Since iversion/noiversion is /already/ advertised as a vfs-level mount option, wouldn't exposing it in /proc/mounts solve the original problem here? ("i_version" is wrong, because it's ext4-specific, but "iversion" is handled by the vfs, so it's meaningful for any filesystems, and it will also trivially allow mount(2) to preserve it across remounts for all filesystems that set it by default.) Seems like that's the fastest path to fixing the current problems, even if a long-term goal may be to deprecate it altogether. -Eric
On Mon, Jul 13, 2020 at 04:45:19PM -0700, Eric Sandeen wrote: > I wandered back into this thread for some reason ... ;) > > Since iversion/noiversion is /already/ advertised as a vfs-level mount option, > wouldn't exposing it in /proc/mounts solve the original problem here? > > ("i_version" is wrong, because it's ext4-specific, but "iversion" is handled > by the vfs, so it's meaningful for any filesystems, and it will also trivially > allow mount(2) to preserve it across remounts for all filesystems that set it by > default.) > > Seems like that's the fastest path to fixing the current problems, even if a > long-term goal may be to deprecate it altogether. But they should not be exposed as a mount option. E.g. for XFS we decide internally if we have a useful i_version or not, totally independent of the mount option that leaked into the VFS. So we'll need to fix how the flag is used before doing any new work in this area.
On 7/14/20 1:30 AM, Christoph Hellwig wrote: > On Mon, Jul 13, 2020 at 04:45:19PM -0700, Eric Sandeen wrote: >> I wandered back into this thread for some reason ... ;) >> >> Since iversion/noiversion is /already/ advertised as a vfs-level mount option, >> wouldn't exposing it in /proc/mounts solve the original problem here? >> >> ("i_version" is wrong, because it's ext4-specific, but "iversion" is handled >> by the vfs, so it's meaningful for any filesystems, and it will also trivially >> allow mount(2) to preserve it across remounts for all filesystems that set it by >> default.) >> >> Seems like that's the fastest path to fixing the current problems, even if a >> long-term goal may be to deprecate it altogether. > > But they should not be exposed as a mount option. E.g. for XFS we > decide internally if we have a useful i_version or not, totally > independent of the mount option that leaked into the VFS. So we'll > need to fix how the flag is used before doing any new work in this area. > It's been explicitly exposed, documented, fixed, updated etc for about 12 years now. I was just hoping to make the current situation - even if we regret its mere existence - less broken, because going down a deprecation path will take us a while even if we choose it. In the meantime I'll just make sure xfs isn't broken on remount, but had hoped for a more general fix. *shrug* -Eric
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a29e8ea1a7ab..879289de1818 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2325,8 +2325,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb, SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time); if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME) SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time); - if (sb->s_flags & SB_I_VERSION) - SEQ_OPTS_PUTS("i_version"); if (nodefs || sbi->s_stripe) SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe); if (nodefs || EXT4_MOUNT_DATA_FLAGS & diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3059a9394c2d..848c4afaef05 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -49,6 +49,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb) { SB_DIRSYNC, ",dirsync" }, { SB_MANDLOCK, ",mand" }, { SB_LAZYTIME, ",lazytime" }, + { SB_I_VERSION, ",i_version" }, { 0, NULL } }; const struct proc_fs_opts *fs_infop;