Message ID | 20241025191454.72616-5-ebiggers@kernel.org |
---|---|
State | New |
Headers | show |
Series | Wire up CRC32 library functions to arch-optimized code | expand |
On Fri, 25 Oct 2024 at 21:15, Eric Biggers <ebiggers@kernel.org> wrote: > > From: Eric Biggers <ebiggers@google.com> > > Instead of registering the crc32-$arch and crc32c-$arch algorithms if > the arch-specific code was built, only register them when that code was > built *and* is not falling back to the base implementation at runtime. > > This avoids confusing users like btrfs which checks the shash driver > name to determine whether it is crc32c-generic. > I think we agree that 'generic' specifically means a C implementation that is identical across all architectures, which is why I updated my patch to export -arch instead of wrapping the C code in yet another driver just for the fuzzing tests. So why is this a problem? If no optimizations are available at runtime, crc32-arch and crc32-generic are interchangeable, and so it shouldn't matter whether you use one or the other. You can infer from the driver name whether the C code is being used, not whether or not the implementation is 'fast', and the btrfs hack is already broken on arm64. > (It would also make sense to change btrfs to test the crc32_optimization > flags itself, so that it doesn't have to use the weird hack of parsing > the driver name. This change still makes sense either way though.) > Indeed. That hack is very dubious and I'd be inclined just to ignore this. On x86 and arm64, it shouldn't make a difference, given that crc32-arch will be 'fast' in the vast majority of cases. On other architectures, btrfs may use the C implementation while assuming it is something faster, and if anyone actually notices the difference, we can work with the btrfs devs to do something more sensible here. > Signed-off-by: Eric Biggers <ebiggers@google.com> > --- > crypto/crc32_generic.c | 8 ++++++-- > crypto/crc32c_generic.c | 8 ++++++-- > 2 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/crypto/crc32_generic.c b/crypto/crc32_generic.c > index cc064ea8240e..cecd01e4d6e6 100644 > --- a/crypto/crc32_generic.c > +++ b/crypto/crc32_generic.c > @@ -155,19 +155,23 @@ static struct shash_alg algs[] = {{ > .base.cra_ctxsize = sizeof(u32), > .base.cra_module = THIS_MODULE, > .base.cra_init = crc32_cra_init, > }}; > > +static int num_algs; > + > static int __init crc32_mod_init(void) > { > /* register the arch flavor only if it differs from the generic one */ > - return crypto_register_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); > + num_algs = 1 + ((crc32_optimizations & CRC32_LE_OPTIMIZATION) != 0); > + > + return crypto_register_shashes(algs, num_algs); > } > > static void __exit crc32_mod_fini(void) > { > - crypto_unregister_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); > + crypto_unregister_shashes(algs, num_algs); > } > > subsys_initcall(crc32_mod_init); > module_exit(crc32_mod_fini); > > diff --git a/crypto/crc32c_generic.c b/crypto/crc32c_generic.c > index 04b03d825cf4..47d694da9d4a 100644 > --- a/crypto/crc32c_generic.c > +++ b/crypto/crc32c_generic.c > @@ -195,19 +195,23 @@ static struct shash_alg algs[] = {{ > .base.cra_ctxsize = sizeof(struct chksum_ctx), > .base.cra_module = THIS_MODULE, > .base.cra_init = crc32c_cra_init, > }}; > > +static int num_algs; > + > static int __init crc32c_mod_init(void) > { > /* register the arch flavor only if it differs from the generic one */ > - return crypto_register_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); > + num_algs = 1 + ((crc32_optimizations & CRC32C_OPTIMIZATION) != 0); > + > + return crypto_register_shashes(algs, num_algs); > } > > static void __exit crc32c_mod_fini(void) > { > - crypto_unregister_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); > + crypto_unregister_shashes(algs, num_algs); > } > > subsys_initcall(crc32c_mod_init); > module_exit(crc32c_mod_fini); > > -- > 2.47.0 > >
On Fri, Oct 25, 2024 at 10:47:15PM +0200, Ard Biesheuvel wrote: > On Fri, 25 Oct 2024 at 21:15, Eric Biggers <ebiggers@kernel.org> wrote: > > > > From: Eric Biggers <ebiggers@google.com> > > > > Instead of registering the crc32-$arch and crc32c-$arch algorithms if > > the arch-specific code was built, only register them when that code was > > built *and* is not falling back to the base implementation at runtime. > > > > This avoids confusing users like btrfs which checks the shash driver > > name to determine whether it is crc32c-generic. > > > > I think we agree that 'generic' specifically means a C implementation > that is identical across all architectures, which is why I updated my > patch to export -arch instead of wrapping the C code in yet another > driver just for the fuzzing tests. > > So why is this a problem? If no optimizations are available at > runtime, crc32-arch and crc32-generic are interchangeable, and so it > shouldn't matter whether you use one or the other. > > You can infer from the driver name whether the C code is being used, > not whether or not the implementation is 'fast', and the btrfs hack is > already broken on arm64. > > > (It would also make sense to change btrfs to test the crc32_optimization > > flags itself, so that it doesn't have to use the weird hack of parsing > > the driver name. This change still makes sense either way though.) > > > > Indeed. That hack is very dubious and I'd be inclined just to ignore > this. On x86 and arm64, it shouldn't make a difference, given that > crc32-arch will be 'fast' in the vast majority of cases. On other > architectures, btrfs may use the C implementation while assuming it is > something faster, and if anyone actually notices the difference, we > can work with the btrfs devs to do something more sensible here. Yes, we probably could get away without this. It's never really been appropriate to use the crypto driver names for anything important. And btrfs probably should just assume CRC32C == fast unconditionally, like what it does with xxHash64, or even do a quick benchmark to measure the actual speed of its hash algorithm (which can also be sha256 or blake2b which can be very fast too). Besides the btrfs case, my concern was there may be advice floating around about checking /proc/crypto to check what optimized code is being used. Having crc32-$arch potentially be running the generic code would make that misleading. It might make sense to keep it working similar to how it did before. But I do agree that we could probably get away without this. - Eric
On Fri, Oct 25, 2024 at 10:02:39PM +0000, Eric Biggers wrote: > On Fri, Oct 25, 2024 at 10:47:15PM +0200, Ard Biesheuvel wrote: > > On Fri, 25 Oct 2024 at 21:15, Eric Biggers <ebiggers@kernel.org> wrote: > > > > > > From: Eric Biggers <ebiggers@google.com> > > > > > > Instead of registering the crc32-$arch and crc32c-$arch algorithms if > > > the arch-specific code was built, only register them when that code was > > > built *and* is not falling back to the base implementation at runtime. > > > > > > This avoids confusing users like btrfs which checks the shash driver > > > name to determine whether it is crc32c-generic. > > > > > > > I think we agree that 'generic' specifically means a C implementation > > that is identical across all architectures, which is why I updated my > > patch to export -arch instead of wrapping the C code in yet another > > driver just for the fuzzing tests. > > > > So why is this a problem? If no optimizations are available at > > runtime, crc32-arch and crc32-generic are interchangeable, and so it > > shouldn't matter whether you use one or the other. > > > > You can infer from the driver name whether the C code is being used, > > not whether or not the implementation is 'fast', and the btrfs hack is > > already broken on arm64. > > > > > (It would also make sense to change btrfs to test the crc32_optimization > > > flags itself, so that it doesn't have to use the weird hack of parsing > > > the driver name. This change still makes sense either way though.) > > > > > > > Indeed. That hack is very dubious and I'd be inclined just to ignore > > this. On x86 and arm64, it shouldn't make a difference, given that > > crc32-arch will be 'fast' in the vast majority of cases. On other > > architectures, btrfs may use the C implementation while assuming it is > > something faster, and if anyone actually notices the difference, we > > can work with the btrfs devs to do something more sensible here. > > Yes, we probably could get away without this. It's never really been > appropriate to use the crypto driver names for anything important. And btrfs > probably should just assume CRC32C == fast unconditionally, like what it does > with xxHash64, or even do a quick benchmark to measure the actual speed of its > hash algorithm (which can also be sha256 or blake2b which can be very fast too). > > Besides the btrfs case, my concern was there may be advice floating around about > checking /proc/crypto to check what optimized code is being used. Having > crc32-$arch potentially be running the generic code would make that misleading. > It might make sense to keep it working similar to how it did before. > > But I do agree that we could probably get away without this. While testing this patchset I notice that none of the crypto API drivers for crc32 or crc32c even need to be loaded on my system anymore, as everything on my system that uses those algorithms (such as ext4) just uses the library APIs now. That makes the "check /proc/crypto" trick stop working anyway. I think you're right that we shouldn't bother with patches 3-4, and I'll plan to go back to leaving them out in the next version, unless someone yells. - Eric
On Sat, 26 Oct 2024 at 06:10, Eric Biggers <ebiggers@kernel.org> wrote: > > On Fri, Oct 25, 2024 at 10:02:39PM +0000, Eric Biggers wrote: > > On Fri, Oct 25, 2024 at 10:47:15PM +0200, Ard Biesheuvel wrote: > > > On Fri, 25 Oct 2024 at 21:15, Eric Biggers <ebiggers@kernel.org> wrote: > > > > > > > > From: Eric Biggers <ebiggers@google.com> > > > > > > > > Instead of registering the crc32-$arch and crc32c-$arch algorithms if > > > > the arch-specific code was built, only register them when that code was > > > > built *and* is not falling back to the base implementation at runtime. > > > > > > > > This avoids confusing users like btrfs which checks the shash driver > > > > name to determine whether it is crc32c-generic. > > > > > > > > > > I think we agree that 'generic' specifically means a C implementation > > > that is identical across all architectures, which is why I updated my > > > patch to export -arch instead of wrapping the C code in yet another > > > driver just for the fuzzing tests. > > > > > > So why is this a problem? If no optimizations are available at > > > runtime, crc32-arch and crc32-generic are interchangeable, and so it > > > shouldn't matter whether you use one or the other. > > > > > > You can infer from the driver name whether the C code is being used, > > > not whether or not the implementation is 'fast', and the btrfs hack is > > > already broken on arm64. > > > > > > > (It would also make sense to change btrfs to test the crc32_optimization > > > > flags itself, so that it doesn't have to use the weird hack of parsing > > > > the driver name. This change still makes sense either way though.) > > > > > > > > > > Indeed. That hack is very dubious and I'd be inclined just to ignore > > > this. On x86 and arm64, it shouldn't make a difference, given that > > > crc32-arch will be 'fast' in the vast majority of cases. On other > > > architectures, btrfs may use the C implementation while assuming it is > > > something faster, and if anyone actually notices the difference, we > > > can work with the btrfs devs to do something more sensible here. > > > > Yes, we probably could get away without this. It's never really been > > appropriate to use the crypto driver names for anything important. And btrfs > > probably should just assume CRC32C == fast unconditionally, like what it does > > with xxHash64, or even do a quick benchmark to measure the actual speed of its > > hash algorithm (which can also be sha256 or blake2b which can be very fast too). > > > > Besides the btrfs case, my concern was there may be advice floating around about > > checking /proc/crypto to check what optimized code is being used. Having > > crc32-$arch potentially be running the generic code would make that misleading. > > It might make sense to keep it working similar to how it did before. > > > > But I do agree that we could probably get away without this. > > While testing this patchset I notice that none of the crypto API drivers for > crc32 or crc32c even need to be loaded on my system anymore, as everything on my > system that uses those algorithms (such as ext4) just uses the library APIs now. > That makes the "check /proc/crypto" trick stop working anyway. > > I think you're right that we shouldn't bother with patches 3-4, and I'll plan to > go back to leaving them out in the next version, unless someone yells. > Agreed. If we need to make this distinction, it might be cleaner to use the static_call API instead, e.g., +DECLARE_STATIC_CALL(crc32_le_arch, crc32_le_base); + static inline u32 __pure crc32_le(u32 crc, const u8 *p, size_t len) { if (IS_ENABLED(CONFIG_CRC32_ARCH)) - return crc32_le_arch(crc, p, len); + return static_call(crc32_le_arch)(crc, p, len); return crc32_le_base(crc, p, len); } and use static_call_update() to update the target if the feature is supported. Then, we could check in the driver whether the static call points to the default or not: +static bool have_arch; + static int __init crc32_mod_init(void) { + have_arch = IS_ENABLED(CONFIG_CRC32_ARCH) && + static_call_query(crc32_le_arch) != crc32_le_base; + /* register the arch flavor only if it differs from the generic one */ - return crypto_register_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); + return crypto_register_shashes(algs, 1 + have_arch); }
diff --git a/crypto/crc32_generic.c b/crypto/crc32_generic.c index cc064ea8240e..cecd01e4d6e6 100644 --- a/crypto/crc32_generic.c +++ b/crypto/crc32_generic.c @@ -155,19 +155,23 @@ static struct shash_alg algs[] = {{ .base.cra_ctxsize = sizeof(u32), .base.cra_module = THIS_MODULE, .base.cra_init = crc32_cra_init, }}; +static int num_algs; + static int __init crc32_mod_init(void) { /* register the arch flavor only if it differs from the generic one */ - return crypto_register_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); + num_algs = 1 + ((crc32_optimizations & CRC32_LE_OPTIMIZATION) != 0); + + return crypto_register_shashes(algs, num_algs); } static void __exit crc32_mod_fini(void) { - crypto_unregister_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); + crypto_unregister_shashes(algs, num_algs); } subsys_initcall(crc32_mod_init); module_exit(crc32_mod_fini); diff --git a/crypto/crc32c_generic.c b/crypto/crc32c_generic.c index 04b03d825cf4..47d694da9d4a 100644 --- a/crypto/crc32c_generic.c +++ b/crypto/crc32c_generic.c @@ -195,19 +195,23 @@ static struct shash_alg algs[] = {{ .base.cra_ctxsize = sizeof(struct chksum_ctx), .base.cra_module = THIS_MODULE, .base.cra_init = crc32c_cra_init, }}; +static int num_algs; + static int __init crc32c_mod_init(void) { /* register the arch flavor only if it differs from the generic one */ - return crypto_register_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); + num_algs = 1 + ((crc32_optimizations & CRC32C_OPTIMIZATION) != 0); + + return crypto_register_shashes(algs, num_algs); } static void __exit crc32c_mod_fini(void) { - crypto_unregister_shashes(algs, 1 + IS_ENABLED(CONFIG_CRC32_ARCH)); + crypto_unregister_shashes(algs, num_algs); } subsys_initcall(crc32c_mod_init); module_exit(crc32c_mod_fini);