Message ID | 202003281643.02SGh9vH007105@sdf.org |
---|---|
State | New |
Headers | show |
Series | None | expand |
On Mar 18, 2019, at 7:32 PM, George Spelvin <lkml@sdf.org> wrote: > > This came about as part of auditing prandom_u32() usage, but > this is a special case: sometimes the starting value comes > from prandom_u32, and sometimes it comes from a hash of a name. > > Does the name hash algorithm have to be stable? In that case, this > change would alter it. But it appears to use s_hash_seed which > is regenerated on "e2fsck -D", so maybe changing it isn't a big deal. This function is only selecting a starting group when searching for a place to allocate a directory. It does not need to be stable. The use of the name hash was introduced in the following commit: f157a4aa98a18bd3817a72bea90d48494e2586e7 Author: Theodore Ts'o <tytso@mit.edu> AuthorDate: Sat Jun 13 11:09:42 2009 -0400 ext4: Use hash of topdir directory name for Orlov parent group Instead of using a random number to determine the goal parent group for Orlov top directories, use a hash of the directory name. This allows for repeatable results when trying to benchmark filesystem layout algorithms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> So I think the current patch is fine. The for-loop construct of using "++g == ngroups && (g = 0)" to wrap "g" around is new to me, but looks correct. Reviewed-by: Andreas Dilger <adilger@dilger.ca> > Signed-off-by: George Spelvin <lkml@sdf.org> > Cc: "Theodore Ts'o" <tytso@mit.edu> > Cc: Andreas Dilger <adilger.kernel@dilger.ca> > Cc: linux-ext4@vger.kernel.org > --- > fs/ext4/ialloc.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 7db0c8814f2ec..a4ea89b3ed368 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -457,9 +457,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, > grp = hinfo.hash; > } else > grp = prandom_u32(); > - parent_group = (unsigned)grp % ngroups; > - for (i = 0; i < ngroups; i++) { > - g = (parent_group + i) % ngroups; > + g = parent_group = reciprocal_scale(grp, ngroups); > + for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { > get_orlov_stats(sb, g, flex_size, &stats); > if (!stats.free_inodes) > continue; > -- > 2.26.0 > Cheers, Andreas
On Sat, Mar 28, 2020 at 04:56:17PM -0600, Andreas Dilger wrote: > On Mar 18, 2019, at 7:32 PM, George Spelvin <lkml@sdf.org> wrote: >> Does the name hash algorithm have to be stable? In that case, this >> change would alter it. But it appears to use s_hash_seed which >> is regenerated on "e2fsck -D", so maybe changing it isn't a big deal. > > This function is only selecting a starting group when searching for > a place to allocate a directory. It does not need to be stable. > > The use of the name hash was introduced in the following commit: > > f157a4aa98a18bd3817a72bea90d48494e2586e7 > Author: Theodore Ts'o <tytso@mit.edu> > AuthorDate: Sat Jun 13 11:09:42 2009 -0400 > > ext4: Use hash of topdir directory name for Orlov parent group > > Instead of using a random number to determine the goal parent group > for Orlov top directories, use a hash of the directory name. This > allows for repeatable results when trying to benchmark filesystem > layout algorithms. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > So I think the current patch is fine. The for-loop construct of > using "++g == ngroups && (g = 0)" to wrap "g" around is new to me, > but looks correct. > > Reviewed-by: Andreas Dilger <adilger@dilger.ca> Thank you. Standing back and looking from higher altitude, I missed a second modulo at fallback_retry: which should be made consistent, so I need a one re-spin. Also, we could, if desired, eliminate the i variable entirely using the fact that we have a copy of the starting position cached in parent_group. I.e. g = parent_group = reciprocal_scale(grp, ngroups); - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { + do { ... - } + if (++g == ngroups) + g = 0; + } while (g != parent_group); Or perhaps the following would be simpler, replacing the modulo with a conditional subtract: - g = parent_group = reciprocal_scale(grp, ngroups); + parent_group = reciprocal_scale(grp, ngroups); - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { + for (i = 0; i < ngroups; i++) { + g = parent_group + i; + if (g >= ngroups) + g -= ngroups; The conditional branch starts out always false, and ends up always true, but except for a few bobbles when it switches, branch prediction should handle it very well. Any preference? (Seriously, thank you for a second set of eyes. This patch set contains so many almost-identical changes that my eyes were glazing over and I couldn't see bugs.)
On Mar 28, 2020, at 5:15 PM, George Spelvin <lkml@SDF.ORG> wrote: > > On Sat, Mar 28, 2020 at 04:56:17PM -0600, Andreas Dilger wrote: >> >> So I think the current patch is fine. The for-loop construct of >> using "++g == ngroups && (g = 0)" to wrap "g" around is new to me, >> but looks correct. >> >> Reviewed-by: Andreas Dilger <adilger@dilger.ca> > > Thank you. Standing back and looking from higher altitude, I missed > a second modulo at fallback_retry: which should be made consistent, > so I need a one re-spin. > > Also, we could, if desired, eliminate the i variable entirely > using the fact that we have a copy of the starting position cached > in parent_group. I.e. > > g = parent_group = reciprocal_scale(grp, ngroups); > - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { > + do { > ... > - } > + if (++g == ngroups) > + g = 0; > + } while (g != parent_group); > > Or perhaps the following would be simpler, replacing the modulo > with a conditional subtract: > > - g = parent_group = reciprocal_scale(grp, ngroups); > + parent_group = reciprocal_scale(grp, ngroups); > - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { > + for (i = 0; i < ngroups; i++) { > + g = parent_group + i; > + if (g >= ngroups) > + g -= ngroups; > > The conditional branch starts out always false, and ends up always true, > but except for a few bobbles when it switches, branch prediction should > handle it very well. > > Any preference? I was looking at whether we could use a for-loop without "i"? Something like: for (g = parent_group + 1; g != parent_group; ++g >= ngroups && (g = 0)) The initial group is parent_group + 1, to avoid special-casing when the initial parent_group = 0 (which would prevent the loop from terminating). Cheers, Andreas
On Sat, Mar 28, 2020 at 06:10:11PM -0600, Andreas Dilger wrote: > On Mar 28, 2020, at 5:15 PM, George Spelvin <lkml@SDF.ORG> wrote: >> Also, we could, if desired, eliminate the i variable entirely >> using the fact that we have a copy of the starting position cached >> in parent_group. I.e. >> >> g = parent_group = reciprocal_scale(grp, ngroups); >> - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { >> + do { >> ... >> - } >> + if (++g == ngroups) >> + g = 0; >> + } while (g != parent_group); > I was looking at whether we could use a for-loop without "i"? Something like: > > for (g = parent_group + 1; g != parent_group; ++g >= ngroups && (g = 0)) > > The initial group is parent_group + 1, to avoid special-casing when the > initial parent_group = 0 (which would prevent the loop from terminating). That's the first option I presented, above. Since a for() loop tests before each iteration, if the counter is strictly modulo ngroups, there's no way to execute the loop body more than ngroups-1 times. That's why I changed to do{}while(), which has a minimum of 1 (it can't handle ngroups == 0), but can mimic the current loop's execution perfectly (no initial +1 offset).
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 7db0c8814f2ec..a4ea89b3ed368 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -457,9 +457,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, grp = hinfo.hash; } else grp = prandom_u32(); - parent_group = (unsigned)grp % ngroups; - for (i = 0; i < ngroups; i++) { - g = (parent_group + i) % ngroups; + g = parent_group = reciprocal_scale(grp, ngroups); + for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) { get_orlov_stats(sb, g, flex_size, &stats); if (!stats.free_inodes) continue;
This came about as part of auditing prandom_u32() usage, but this is a special case: sometimes the starting value comes from prandom_u32, and sometimes it comes from a hash of a name. Does the name hash algorithm have to be stable? In that case, this change would alter it. But it appears to use s_hash_seed which is regenerated on "e2fsck -D", so maybe changing it isn't a big deal. Feedback needed. Signed-off-by: George Spelvin <lkml@sdf.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org --- fs/ext4/ialloc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)