Message ID | 20220328220936.2724834-1-goldstein.w.n@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v1,1/2] random-bits: Factor out entropy generating function | expand |
On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > On some architectures `clock_gettime` is undesirable as > it may use a syscall or there may be a faster alternative. > Future architecture specific functions can be added in > sysdeps/<arch>/random-bits-entropy.h to provide a version of > 'random_bits_entropy' that doesn't use 'clock_gettime'. > --- > include/random-bits.h | 16 ++++++-------- > sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+), 10 deletions(-) > create mode 100644 sysdeps/generic/random-bits-entropy.h > > diff --git a/include/random-bits.h b/include/random-bits.h > index 17665b479a..016b87576c 100644 > --- a/include/random-bits.h > +++ b/include/random-bits.h > @@ -19,21 +19,17 @@ > #ifndef _RANDOM_BITS_H > # define _RANDOM_BITS_H > > -#include <time.h> > -#include <stdint.h> > +# include <random-bits-entropy.h> > +# include <stdint.h> > > -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > - starting time, nano-second accuracy, its randomness is significantly better > - than gettimeofday, and for mostly architectures it is implemented through > - vDSO instead of a syscall. Since the source is a system clock, the upper > - bits will have less entropy. */ > +/* Provides fast pseudo-random bits through architecture specific > + random_bits_entropy. Expectation is source is some timing function so > + the upper bits have less entropy. */ > static inline uint32_t > random_bits (void) > { > - struct __timespec64 tv; > - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > + uint32_t ret = random_bits_entropy (); > /* Shuffle the lower bits to minimize the clock bias. */ > - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > ret ^= (ret << 24) | (ret >> 8); > return ret; > } We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on generic interface (and other high precision timing on other architectures). So I think a better way would be to: static inline uint32_t random_bits (void) { hp_timing_t hp; HP_TIMING_NOW (hp); /* Shuffle the lower bits to minimize the clock bias. */ uint32_t ret = hp >> 32 ^ (uint32_t) hp; ret ^= (ret << 24) | (ret >> 8); return ret; } And keep the XOR on with higher bits to keep the clock bias. > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h > new file mode 100644 > index 0000000000..53290c7f7a > --- /dev/null > +++ b/sysdeps/generic/random-bits-entropy.h > @@ -0,0 +1,31 @@ > +/* Fast function for generating entropy of random_bits. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <stdint.h> > +#include <time.h> > + > +/* Generically use clock_gettime. It has unspecified starting time, nano-second > + accuracy, its randomness is significantly better than gettimeofday, and for > + mostly architectures it is implemented through vDSO instead of a syscall. */ > +static inline uint32_t > +random_bits_entropy (void) > +{ > + struct __timespec64 tv; > + __clock_gettime64 (CLOCK_MONOTONIC, &tv); > + return tv.tv_nsec ^ tv.tv_sec; > +}
On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > > On some architectures `clock_gettime` is undesirable as > > it may use a syscall or there may be a faster alternative. > > Future architecture specific functions can be added in > > sysdeps/<arch>/random-bits-entropy.h to provide a version of > > 'random_bits_entropy' that doesn't use 'clock_gettime'. > > --- > > include/random-bits.h | 16 ++++++-------- > > sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > > 2 files changed, 37 insertions(+), 10 deletions(-) > > create mode 100644 sysdeps/generic/random-bits-entropy.h > > > > diff --git a/include/random-bits.h b/include/random-bits.h > > index 17665b479a..016b87576c 100644 > > --- a/include/random-bits.h > > +++ b/include/random-bits.h > > @@ -19,21 +19,17 @@ > > #ifndef _RANDOM_BITS_H > > # define _RANDOM_BITS_H > > > > -#include <time.h> > > -#include <stdint.h> > > +# include <random-bits-entropy.h> > > +# include <stdint.h> > > > > -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > > - starting time, nano-second accuracy, its randomness is significantly better > > - than gettimeofday, and for mostly architectures it is implemented through > > - vDSO instead of a syscall. Since the source is a system clock, the upper > > - bits will have less entropy. */ > > +/* Provides fast pseudo-random bits through architecture specific > > + random_bits_entropy. Expectation is source is some timing function so > > + the upper bits have less entropy. */ > > static inline uint32_t > > random_bits (void) > > { > > - struct __timespec64 tv; > > - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > + uint32_t ret = random_bits_entropy (); > > /* Shuffle the lower bits to minimize the clock bias. */ > > - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > > ret ^= (ret << 24) | (ret >> 8); > > return ret; > > } > > We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > generic interface (and other high precision timing on other architectures). > So I think a better way would be to: For x86/generic that works but other architectures also have hp-timing implementations that might not be suitable for this (i.e there might be an entropy regression). > > static inline uint32_t > random_bits (void) > { > hp_timing_t hp; > HP_TIMING_NOW (hp); > /* Shuffle the lower bits to minimize the clock bias. */ > uint32_t ret = hp >> 32 ^ (uint32_t) hp; > ret ^= (ret << 24) | (ret >> 8); > return ret; > } > > And keep the XOR on with higher bits to keep the clock bias. > > > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h > > new file mode 100644 > > index 0000000000..53290c7f7a > > --- /dev/null > > +++ b/sysdeps/generic/random-bits-entropy.h > > @@ -0,0 +1,31 @@ > > +/* Fast function for generating entropy of random_bits. > > + Copyright (C) 2022 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +#include <stdint.h> > > +#include <time.h> > > + > > +/* Generically use clock_gettime. It has unspecified starting time, nano-second > > + accuracy, its randomness is significantly better than gettimeofday, and for > > + mostly architectures it is implemented through vDSO instead of a syscall. */ > > +static inline uint32_t > > +random_bits_entropy (void) > > +{ > > + struct __timespec64 tv; > > + __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > + return tv.tv_nsec ^ tv.tv_sec; > > +}
On Tue, Mar 29, 2022 at 2:56 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: > > > > > > > > On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > > > On some architectures `clock_gettime` is undesirable as > > > it may use a syscall or there may be a faster alternative. > > > Future architecture specific functions can be added in > > > sysdeps/<arch>/random-bits-entropy.h to provide a version of > > > 'random_bits_entropy' that doesn't use 'clock_gettime'. > > > --- > > > include/random-bits.h | 16 ++++++-------- > > > sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+), 10 deletions(-) > > > create mode 100644 sysdeps/generic/random-bits-entropy.h > > > > > > diff --git a/include/random-bits.h b/include/random-bits.h > > > index 17665b479a..016b87576c 100644 > > > --- a/include/random-bits.h > > > +++ b/include/random-bits.h > > > @@ -19,21 +19,17 @@ > > > #ifndef _RANDOM_BITS_H > > > # define _RANDOM_BITS_H > > > > > > -#include <time.h> > > > -#include <stdint.h> > > > +# include <random-bits-entropy.h> > > > +# include <stdint.h> > > > > > > -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > > > - starting time, nano-second accuracy, its randomness is significantly better > > > - than gettimeofday, and for mostly architectures it is implemented through > > > - vDSO instead of a syscall. Since the source is a system clock, the upper > > > - bits will have less entropy. */ > > > +/* Provides fast pseudo-random bits through architecture specific > > > + random_bits_entropy. Expectation is source is some timing function so > > > + the upper bits have less entropy. */ > > > static inline uint32_t > > > random_bits (void) > > > { > > > - struct __timespec64 tv; > > > - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > > + uint32_t ret = random_bits_entropy (); > > > /* Shuffle the lower bits to minimize the clock bias. */ > > > - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > > > ret ^= (ret << 24) | (ret >> 8); > > > return ret; > > > } > > > > We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > > generic interface (and other high precision timing on other architectures). > > So I think a better way would be to: > > For x86/generic that works but other architectures also have hp-timing > implementations that might not be suitable for this (i.e there might be > an entropy regression). > > > > > static inline uint32_t > > random_bits (void) > > { > > hp_timing_t hp; > > HP_TIMING_NOW (hp); Also not HP_TIMING_NOW will be slightly slower (without reason) as instead of xoring ns and seconds it does ns + (second * 10^9). Seems to generate the same amount of entropy so it's just an extra multiply on the critical path. > > /* Shuffle the lower bits to minimize the clock bias. */ > > uint32_t ret = hp >> 32 ^ (uint32_t) hp; > > ret ^= (ret << 24) | (ret >> 8); > > return ret; > > } > > > > And keep the XOR on with higher bits to keep the clock bias. > > > > > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h > > > new file mode 100644 > > > index 0000000000..53290c7f7a > > > --- /dev/null > > > +++ b/sysdeps/generic/random-bits-entropy.h > > > @@ -0,0 +1,31 @@ > > > +/* Fast function for generating entropy of random_bits. > > > + Copyright (C) 2022 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + <https://www.gnu.org/licenses/>. */ > > > + > > > +#include <stdint.h> > > > +#include <time.h> > > > + > > > +/* Generically use clock_gettime. It has unspecified starting time, nano-second > > > + accuracy, its randomness is significantly better than gettimeofday, and for > > > + mostly architectures it is implemented through vDSO instead of a syscall. */ > > > +static inline uint32_t > > > +random_bits_entropy (void) > > > +{ > > > + struct __timespec64 tv; > > > + __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > > + return tv.tv_nsec ^ tv.tv_sec; > > > +}
On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: > > > > > > > > On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > > > On some architectures `clock_gettime` is undesirable as > > > it may use a syscall or there may be a faster alternative. > > > Future architecture specific functions can be added in > > > sysdeps/<arch>/random-bits-entropy.h to provide a version of > > > 'random_bits_entropy' that doesn't use 'clock_gettime'. > > > --- > > > include/random-bits.h | 16 ++++++-------- > > > sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+), 10 deletions(-) > > > create mode 100644 sysdeps/generic/random-bits-entropy.h > > > > > > diff --git a/include/random-bits.h b/include/random-bits.h > > > index 17665b479a..016b87576c 100644 > > > --- a/include/random-bits.h > > > +++ b/include/random-bits.h > > > @@ -19,21 +19,17 @@ > > > #ifndef _RANDOM_BITS_H > > > # define _RANDOM_BITS_H > > > > > > -#include <time.h> > > > -#include <stdint.h> > > > +# include <random-bits-entropy.h> > > > +# include <stdint.h> > > > > > > -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > > > - starting time, nano-second accuracy, its randomness is significantly better > > > - than gettimeofday, and for mostly architectures it is implemented through > > > - vDSO instead of a syscall. Since the source is a system clock, the upper > > > - bits will have less entropy. */ > > > +/* Provides fast pseudo-random bits through architecture specific > > > + random_bits_entropy. Expectation is source is some timing function so > > > + the upper bits have less entropy. */ > > > static inline uint32_t > > > random_bits (void) > > > { > > > - struct __timespec64 tv; > > > - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > > + uint32_t ret = random_bits_entropy (); > > > /* Shuffle the lower bits to minimize the clock bias. */ > > > - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > > > ret ^= (ret << 24) | (ret >> 8); > > > return ret; > > > } > > > > We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > > generic interface (and other high precision timing on other architectures). > > So I think a better way would be to: > > For x86/generic that works but other architectures also have hp-timing > implementations that might not be suitable for this (i.e there might be > an entropy regression). The default hp-timing.h has # define HP_TIMING_NOW(var) \ ({ \ struct __timespec64 tv; \ __clock_gettime64 (CLOCK_MONOTONIC, &tv); \ (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \ }) It isn't the same as the current include/random-bits.h. > > > > static inline uint32_t > > random_bits (void) > > { > > hp_timing_t hp; > > HP_TIMING_NOW (hp); > > /* Shuffle the lower bits to minimize the clock bias. */ > > uint32_t ret = hp >> 32 ^ (uint32_t) hp; > > ret ^= (ret << 24) | (ret >> 8); > > return ret; > > } > > > > And keep the XOR on with higher bits to keep the clock bias. > > > > > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h > > > new file mode 100644 > > > index 0000000000..53290c7f7a > > > --- /dev/null > > > +++ b/sysdeps/generic/random-bits-entropy.h > > > @@ -0,0 +1,31 @@ > > > +/* Fast function for generating entropy of random_bits. > > > + Copyright (C) 2022 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/or > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version. > > > + > > > + The GNU C Library is distributed in the hope that it will be useful, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + <https://www.gnu.org/licenses/>. */ > > > + > > > +#include <stdint.h> > > > +#include <time.h> > > > + > > > +/* Generically use clock_gettime. It has unspecified starting time, nano-second > > > + accuracy, its randomness is significantly better than gettimeofday, and for > > > + mostly architectures it is implemented through vDSO instead of a syscall. */ > > > +static inline uint32_t > > > +random_bits_entropy (void) > > > +{ > > > + struct __timespec64 tv; > > > + __clock_gettime64 (CLOCK_MONOTONIC, &tv); > > > + return tv.tv_nsec ^ tv.tv_sec; > > > +}
On 29/03/2022 16:56, Noah Goldstein wrote: > On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: >>> On some architectures `clock_gettime` is undesirable as >>> it may use a syscall or there may be a faster alternative. >>> Future architecture specific functions can be added in >>> sysdeps/<arch>/random-bits-entropy.h to provide a version of >>> 'random_bits_entropy' that doesn't use 'clock_gettime'. >>> --- >>> include/random-bits.h | 16 ++++++-------- >>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ >>> 2 files changed, 37 insertions(+), 10 deletions(-) >>> create mode 100644 sysdeps/generic/random-bits-entropy.h >>> >>> diff --git a/include/random-bits.h b/include/random-bits.h >>> index 17665b479a..016b87576c 100644 >>> --- a/include/random-bits.h >>> +++ b/include/random-bits.h >>> @@ -19,21 +19,17 @@ >>> #ifndef _RANDOM_BITS_H >>> # define _RANDOM_BITS_H >>> >>> -#include <time.h> >>> -#include <stdint.h> >>> +# include <random-bits-entropy.h> >>> +# include <stdint.h> >>> >>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified >>> - starting time, nano-second accuracy, its randomness is significantly better >>> - than gettimeofday, and for mostly architectures it is implemented through >>> - vDSO instead of a syscall. Since the source is a system clock, the upper >>> - bits will have less entropy. */ >>> +/* Provides fast pseudo-random bits through architecture specific >>> + random_bits_entropy. Expectation is source is some timing function so >>> + the upper bits have less entropy. */ >>> static inline uint32_t >>> random_bits (void) >>> { >>> - struct __timespec64 tv; >>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); >>> + uint32_t ret = random_bits_entropy (); >>> /* Shuffle the lower bits to minimize the clock bias. */ >>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; >>> ret ^= (ret << 24) | (ret >> 8); >>> return ret; >>> } >> >> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on >> generic interface (and other high precision timing on other architectures). >> So I think a better way would be to: > > For x86/generic that works but other architectures also have hp-timing > implementations that might not be suitable for this (i.e there might be > an entropy regression). I would expect that the entropy of the hp-timing.h instruction would be similar to the ones from system clock (which exception of legacy architecture like alpha), but I haven't checked yet. > >> >> static inline uint32_t >> random_bits (void) >> { >> hp_timing_t hp; >> HP_TIMING_NOW (hp); >> /* Shuffle the lower bits to minimize the clock bias. */ >> uint32_t ret = hp >> 32 ^ (uint32_t) hp; >> ret ^= (ret << 24) | (ret >> 8); >> return ret; >> } >> >> And keep the XOR on with higher bits to keep the clock bias. >> >>> diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h >>> new file mode 100644 >>> index 0000000000..53290c7f7a >>> --- /dev/null >>> +++ b/sysdeps/generic/random-bits-entropy.h >>> @@ -0,0 +1,31 @@ >>> +/* Fast function for generating entropy of random_bits. >>> + Copyright (C) 2022 Free Software Foundation, Inc. >>> + This file is part of the GNU C Library. >>> + >>> + The GNU C Library is free software; you can redistribute it and/or >>> + modify it under the terms of the GNU Lesser General Public >>> + License as published by the Free Software Foundation; either >>> + version 2.1 of the License, or (at your option) any later version. >>> + >>> + The GNU C Library is distributed in the hope that it will be useful, >>> + but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> + Lesser General Public License for more details. >>> + >>> + You should have received a copy of the GNU Lesser General Public >>> + License along with the GNU C Library; if not, see >>> + <https://www.gnu.org/licenses/>. */ >>> + >>> +#include <stdint.h> >>> +#include <time.h> >>> + >>> +/* Generically use clock_gettime. It has unspecified starting time, nano-second >>> + accuracy, its randomness is significantly better than gettimeofday, and for >>> + mostly architectures it is implemented through vDSO instead of a syscall. */ >>> +static inline uint32_t >>> +random_bits_entropy (void) >>> +{ >>> + struct __timespec64 tv; >>> + __clock_gettime64 (CLOCK_MONOTONIC, &tv); >>> + return tv.tv_nsec ^ tv.tv_sec; >>> +}
On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 29/03/2022 16:56, Noah Goldstein wrote: > > On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > > <adhemerval.zanella@linaro.org> wrote: > >> > >> > >> > >> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > >>> On some architectures `clock_gettime` is undesirable as > >>> it may use a syscall or there may be a faster alternative. > >>> Future architecture specific functions can be added in > >>> sysdeps/<arch>/random-bits-entropy.h to provide a version of > >>> 'random_bits_entropy' that doesn't use 'clock_gettime'. > >>> --- > >>> include/random-bits.h | 16 ++++++-------- > >>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > >>> 2 files changed, 37 insertions(+), 10 deletions(-) > >>> create mode 100644 sysdeps/generic/random-bits-entropy.h > >>> > >>> diff --git a/include/random-bits.h b/include/random-bits.h > >>> index 17665b479a..016b87576c 100644 > >>> --- a/include/random-bits.h > >>> +++ b/include/random-bits.h > >>> @@ -19,21 +19,17 @@ > >>> #ifndef _RANDOM_BITS_H > >>> # define _RANDOM_BITS_H > >>> > >>> -#include <time.h> > >>> -#include <stdint.h> > >>> +# include <random-bits-entropy.h> > >>> +# include <stdint.h> > >>> > >>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > >>> - starting time, nano-second accuracy, its randomness is significantly better > >>> - than gettimeofday, and for mostly architectures it is implemented through > >>> - vDSO instead of a syscall. Since the source is a system clock, the upper > >>> - bits will have less entropy. */ > >>> +/* Provides fast pseudo-random bits through architecture specific > >>> + random_bits_entropy. Expectation is source is some timing function so > >>> + the upper bits have less entropy. */ > >>> static inline uint32_t > >>> random_bits (void) > >>> { > >>> - struct __timespec64 tv; > >>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > >>> + uint32_t ret = random_bits_entropy (); > >>> /* Shuffle the lower bits to minimize the clock bias. */ > >>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > >>> ret ^= (ret << 24) | (ret >> 8); > >>> return ret; > >>> } > >> > >> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > >> generic interface (and other high precision timing on other architectures). > >> So I think a better way would be to: > > > > For x86/generic that works but other architectures also have hp-timing > > implementations that might not be suitable for this (i.e there might be > > an entropy regression). > > I would expect that the entropy of the hp-timing.h instruction would be similar > to the ones from system clock (which exception of legacy architecture like alpha), > but I haven't checked yet. Would expect the same, but think it will probably take a test on a per-arch basis. Also there are optimizations we can make since we only need the lower 32-bits and not a true timestamp. I.e no multiply for generic. Also on x86 we can skip combining the results of rdtsc. > > > > >> > >> static inline uint32_t > >> random_bits (void) > >> { > >> hp_timing_t hp; > >> HP_TIMING_NOW (hp); > >> /* Shuffle the lower bits to minimize the clock bias. */ > >> uint32_t ret = hp >> 32 ^ (uint32_t) hp; > >> ret ^= (ret << 24) | (ret >> 8); > >> return ret; > >> } > >> > >> And keep the XOR on with higher bits to keep the clock bias. > >> > >>> diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h > >>> new file mode 100644 > >>> index 0000000000..53290c7f7a > >>> --- /dev/null > >>> +++ b/sysdeps/generic/random-bits-entropy.h > >>> @@ -0,0 +1,31 @@ > >>> +/* Fast function for generating entropy of random_bits. > >>> + Copyright (C) 2022 Free Software Foundation, Inc. > >>> + This file is part of the GNU C Library. > >>> + > >>> + The GNU C Library is free software; you can redistribute it and/or > >>> + modify it under the terms of the GNU Lesser General Public > >>> + License as published by the Free Software Foundation; either > >>> + version 2.1 of the License, or (at your option) any later version. > >>> + > >>> + The GNU C Library is distributed in the hope that it will be useful, > >>> + but WITHOUT ANY WARRANTY; without even the implied warranty of > >>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >>> + Lesser General Public License for more details. > >>> + > >>> + You should have received a copy of the GNU Lesser General Public > >>> + License along with the GNU C Library; if not, see > >>> + <https://www.gnu.org/licenses/>. */ > >>> + > >>> +#include <stdint.h> > >>> +#include <time.h> > >>> + > >>> +/* Generically use clock_gettime. It has unspecified starting time, nano-second > >>> + accuracy, its randomness is significantly better than gettimeofday, and for > >>> + mostly architectures it is implemented through vDSO instead of a syscall. */ > >>> +static inline uint32_t > >>> +random_bits_entropy (void) > >>> +{ > >>> + struct __timespec64 tv; > >>> + __clock_gettime64 (CLOCK_MONOTONIC, &tv); > >>> + return tv.tv_nsec ^ tv.tv_sec; > >>> +}
On 29/03/2022 17:14, H.J. Lu wrote: > On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha > <libc-alpha@sourceware.org> wrote: >> >> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella >> <adhemerval.zanella@linaro.org> wrote: >>> >>> >>> >>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: >>>> On some architectures `clock_gettime` is undesirable as >>>> it may use a syscall or there may be a faster alternative. >>>> Future architecture specific functions can be added in >>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of >>>> 'random_bits_entropy' that doesn't use 'clock_gettime'. >>>> --- >>>> include/random-bits.h | 16 ++++++-------- >>>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ >>>> 2 files changed, 37 insertions(+), 10 deletions(-) >>>> create mode 100644 sysdeps/generic/random-bits-entropy.h >>>> >>>> diff --git a/include/random-bits.h b/include/random-bits.h >>>> index 17665b479a..016b87576c 100644 >>>> --- a/include/random-bits.h >>>> +++ b/include/random-bits.h >>>> @@ -19,21 +19,17 @@ >>>> #ifndef _RANDOM_BITS_H >>>> # define _RANDOM_BITS_H >>>> >>>> -#include <time.h> >>>> -#include <stdint.h> >>>> +# include <random-bits-entropy.h> >>>> +# include <stdint.h> >>>> >>>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified >>>> - starting time, nano-second accuracy, its randomness is significantly better >>>> - than gettimeofday, and for mostly architectures it is implemented through >>>> - vDSO instead of a syscall. Since the source is a system clock, the upper >>>> - bits will have less entropy. */ >>>> +/* Provides fast pseudo-random bits through architecture specific >>>> + random_bits_entropy. Expectation is source is some timing function so >>>> + the upper bits have less entropy. */ >>>> static inline uint32_t >>>> random_bits (void) >>>> { >>>> - struct __timespec64 tv; >>>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); >>>> + uint32_t ret = random_bits_entropy (); >>>> /* Shuffle the lower bits to minimize the clock bias. */ >>>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; >>>> ret ^= (ret << 24) | (ret >> 8); >>>> return ret; >>>> } >>> >>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on >>> generic interface (and other high precision timing on other architectures). >>> So I think a better way would be to: >> >> For x86/generic that works but other architectures also have hp-timing >> implementations that might not be suitable for this (i.e there might be >> an entropy regression). > > The default hp-timing.h has > > # define HP_TIMING_NOW(var) \ > ({ \ > struct __timespec64 tv; \ > __clock_gettime64 (CLOCK_MONOTONIC, &tv); \ > (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \ > }) > > It isn't the same as the current include/random-bits.h. Maybe refactor hp-timing.h to add a routine to get the system clock without any adjustments? I don't have a strong preference here, but I take that you are not aiming to use RDRAND or similar instruction, since you are optimizing to latency. So I see that using hp-timing.h seems the best approach, since it should work similar on different architectures (and each one might disable if the entropy is not large enough).
On Tue, Mar 29, 2022 at 3:44 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 29/03/2022 17:14, H.J. Lu wrote: > > On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha > > <libc-alpha@sourceware.org> wrote: > >> > >> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > >> <adhemerval.zanella@linaro.org> wrote: > >>> > >>> > >>> > >>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > >>>> On some architectures `clock_gettime` is undesirable as > >>>> it may use a syscall or there may be a faster alternative. > >>>> Future architecture specific functions can be added in > >>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of > >>>> 'random_bits_entropy' that doesn't use 'clock_gettime'. > >>>> --- > >>>> include/random-bits.h | 16 ++++++-------- > >>>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > >>>> 2 files changed, 37 insertions(+), 10 deletions(-) > >>>> create mode 100644 sysdeps/generic/random-bits-entropy.h > >>>> > >>>> diff --git a/include/random-bits.h b/include/random-bits.h > >>>> index 17665b479a..016b87576c 100644 > >>>> --- a/include/random-bits.h > >>>> +++ b/include/random-bits.h > >>>> @@ -19,21 +19,17 @@ > >>>> #ifndef _RANDOM_BITS_H > >>>> # define _RANDOM_BITS_H > >>>> > >>>> -#include <time.h> > >>>> -#include <stdint.h> > >>>> +# include <random-bits-entropy.h> > >>>> +# include <stdint.h> > >>>> > >>>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > >>>> - starting time, nano-second accuracy, its randomness is significantly better > >>>> - than gettimeofday, and for mostly architectures it is implemented through > >>>> - vDSO instead of a syscall. Since the source is a system clock, the upper > >>>> - bits will have less entropy. */ > >>>> +/* Provides fast pseudo-random bits through architecture specific > >>>> + random_bits_entropy. Expectation is source is some timing function so > >>>> + the upper bits have less entropy. */ > >>>> static inline uint32_t > >>>> random_bits (void) > >>>> { > >>>> - struct __timespec64 tv; > >>>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > >>>> + uint32_t ret = random_bits_entropy (); > >>>> /* Shuffle the lower bits to minimize the clock bias. */ > >>>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > >>>> ret ^= (ret << 24) | (ret >> 8); > >>>> return ret; > >>>> } > >>> > >>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > >>> generic interface (and other high precision timing on other architectures). > >>> So I think a better way would be to: > >> > >> For x86/generic that works but other architectures also have hp-timing > >> implementations that might not be suitable for this (i.e there might be > >> an entropy regression). > > > > The default hp-timing.h has > > > > # define HP_TIMING_NOW(var) \ > > ({ \ > > struct __timespec64 tv; \ > > __clock_gettime64 (CLOCK_MONOTONIC, &tv); \ > > (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \ > > }) > > > > It isn't the same as the current include/random-bits.h. > > Maybe refactor hp-timing.h to add a routine to get the system clock without > any adjustments? I don't have a strong preference here, but I take that > you are not aiming to use RDRAND or similar instruction, since you are > optimizing to latency. So I see that using hp-timing.h seems the best > approach, since it should work similar on different architectures (and > each one might disable if the entropy is not large enough). That would work. But we would still need to hardcode the random-bits needs in some cases. For example in generic we would still need to combine seconds and nanoseconds and ideally wouldn't use multiply for that. At that point it seems we are doing something logically different enough it would make more sense to just include <hp-timing.h> in <random-bits-entropy.h> if it was appropriate for the architecture.
On 29/03/2022 17:44, Noah Goldstein wrote: > On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 29/03/2022 16:56, Noah Goldstein wrote: >>> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella >>> <adhemerval.zanella@linaro.org> wrote: >>>> >>>> >>>> >>>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: >>>>> On some architectures `clock_gettime` is undesirable as >>>>> it may use a syscall or there may be a faster alternative. >>>>> Future architecture specific functions can be added in >>>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of >>>>> 'random_bits_entropy' that doesn't use 'clock_gettime'. >>>>> --- >>>>> include/random-bits.h | 16 ++++++-------- >>>>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ >>>>> 2 files changed, 37 insertions(+), 10 deletions(-) >>>>> create mode 100644 sysdeps/generic/random-bits-entropy.h >>>>> >>>>> diff --git a/include/random-bits.h b/include/random-bits.h >>>>> index 17665b479a..016b87576c 100644 >>>>> --- a/include/random-bits.h >>>>> +++ b/include/random-bits.h >>>>> @@ -19,21 +19,17 @@ >>>>> #ifndef _RANDOM_BITS_H >>>>> # define _RANDOM_BITS_H >>>>> >>>>> -#include <time.h> >>>>> -#include <stdint.h> >>>>> +# include <random-bits-entropy.h> >>>>> +# include <stdint.h> >>>>> >>>>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified >>>>> - starting time, nano-second accuracy, its randomness is significantly better >>>>> - than gettimeofday, and for mostly architectures it is implemented through >>>>> - vDSO instead of a syscall. Since the source is a system clock, the upper >>>>> - bits will have less entropy. */ >>>>> +/* Provides fast pseudo-random bits through architecture specific >>>>> + random_bits_entropy. Expectation is source is some timing function so >>>>> + the upper bits have less entropy. */ >>>>> static inline uint32_t >>>>> random_bits (void) >>>>> { >>>>> - struct __timespec64 tv; >>>>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); >>>>> + uint32_t ret = random_bits_entropy (); >>>>> /* Shuffle the lower bits to minimize the clock bias. */ >>>>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; >>>>> ret ^= (ret << 24) | (ret >> 8); >>>>> return ret; >>>>> } >>>> >>>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on >>>> generic interface (and other high precision timing on other architectures). >>>> So I think a better way would be to: >>> >>> For x86/generic that works but other architectures also have hp-timing >>> implementations that might not be suitable for this (i.e there might be >>> an entropy regression). >> >> I would expect that the entropy of the hp-timing.h instruction would be similar >> to the ones from system clock (which exception of legacy architecture like alpha), >> but I haven't checked yet. > > Would expect the same, but think it will probably take a test on a > per-arch basis. > > Also there are optimizations we can make since we only need the lower > 32-bits and > not a true timestamp. > > I.e no multiply for generic. Also on x86 we can skip combining the > results of rdtsc. I tested the entropy on some different architectures: aarch64: $ ent gettime-random.txt Entropy = 7.293634 bits per byte. $ ent hptiming-random.txt Entropy = 6.451314 bits per byte. ia64: $ ent gettime-random.txt Entropy = 7.613066 bits per byte. $ ent hptiming-random.txt Entropy = 7.458615 bits per byte. powerpc64le: $ ent gettime-random.txt Entropy = 7.413584 bits per byte. $ ent hptiming-random.txt Entropy = 7.243894 bits per byte. sparc64 $ ent gettime-random.txt Entropy = 7.388590 bits per byte. $ ent hptiming-random.txt Entropy = 7.602368 bits per byte. So it seems that only aarch64 is really losing some entropy when using hp-timing.h (not sure why).
On Wed, Mar 30, 2022 at 10:37 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 29/03/2022 17:44, Noah Goldstein wrote: > > On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella > > <adhemerval.zanella@linaro.org> wrote: > >> > >> > >> > >> On 29/03/2022 16:56, Noah Goldstein wrote: > >>> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella > >>> <adhemerval.zanella@linaro.org> wrote: > >>>> > >>>> > >>>> > >>>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote: > >>>>> On some architectures `clock_gettime` is undesirable as > >>>>> it may use a syscall or there may be a faster alternative. > >>>>> Future architecture specific functions can be added in > >>>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of > >>>>> 'random_bits_entropy' that doesn't use 'clock_gettime'. > >>>>> --- > >>>>> include/random-bits.h | 16 ++++++-------- > >>>>> sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++ > >>>>> 2 files changed, 37 insertions(+), 10 deletions(-) > >>>>> create mode 100644 sysdeps/generic/random-bits-entropy.h > >>>>> > >>>>> diff --git a/include/random-bits.h b/include/random-bits.h > >>>>> index 17665b479a..016b87576c 100644 > >>>>> --- a/include/random-bits.h > >>>>> +++ b/include/random-bits.h > >>>>> @@ -19,21 +19,17 @@ > >>>>> #ifndef _RANDOM_BITS_H > >>>>> # define _RANDOM_BITS_H > >>>>> > >>>>> -#include <time.h> > >>>>> -#include <stdint.h> > >>>>> +# include <random-bits-entropy.h> > >>>>> +# include <stdint.h> > >>>>> > >>>>> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified > >>>>> - starting time, nano-second accuracy, its randomness is significantly better > >>>>> - than gettimeofday, and for mostly architectures it is implemented through > >>>>> - vDSO instead of a syscall. Since the source is a system clock, the upper > >>>>> - bits will have less entropy. */ > >>>>> +/* Provides fast pseudo-random bits through architecture specific > >>>>> + random_bits_entropy. Expectation is source is some timing function so > >>>>> + the upper bits have less entropy. */ > >>>>> static inline uint32_t > >>>>> random_bits (void) > >>>>> { > >>>>> - struct __timespec64 tv; > >>>>> - __clock_gettime64 (CLOCK_MONOTONIC, &tv); > >>>>> + uint32_t ret = random_bits_entropy (); > >>>>> /* Shuffle the lower bits to minimize the clock bias. */ > >>>>> - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; > >>>>> ret ^= (ret << 24) | (ret >> 8); > >>>>> return ret; > >>>>> } > >>>> > >>>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on > >>>> generic interface (and other high precision timing on other architectures). > >>>> So I think a better way would be to: > >>> > >>> For x86/generic that works but other architectures also have hp-timing > >>> implementations that might not be suitable for this (i.e there might be > >>> an entropy regression). > >> > >> I would expect that the entropy of the hp-timing.h instruction would be similar > >> to the ones from system clock (which exception of legacy architecture like alpha), > >> but I haven't checked yet. > > > > Would expect the same, but think it will probably take a test on a > > per-arch basis. > > > > Also there are optimizations we can make since we only need the lower > > 32-bits and > > not a true timestamp. > > > > I.e no multiply for generic. Also on x86 we can skip combining the > > results of rdtsc. > > I tested the entropy on some different architectures: > > aarch64: > $ ent gettime-random.txt > Entropy = 7.293634 bits per byte. > $ ent hptiming-random.txt > Entropy = 6.451314 bits per byte. > > ia64: > $ ent gettime-random.txt > Entropy = 7.613066 bits per byte. > $ ent hptiming-random.txt > Entropy = 7.458615 bits per byte. > > powerpc64le: > $ ent gettime-random.txt > Entropy = 7.413584 bits per byte. > $ ent hptiming-random.txt > Entropy = 7.243894 bits per byte. > > sparc64 > $ ent gettime-random.txt > Entropy = 7.388590 bits per byte. > $ ent hptiming-random.txt > Entropy = 7.602368 bits per byte. > > > So it seems that only aarch64 is really losing some entropy when using > hp-timing.h (not sure why). Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that since it takes testing and there are optimization cases that we may want to make it's still best to have a seperate file.
On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that > since it takes testing and there are optimization cases that we mayf > want to make it's still best to have a seperate file. What is really needed is to get some sort of randomness function exposed through the VDSO, so there is no syscall overhead and all this magic can be dropped. I'm not even sure if that is possible. that or a reliable prng usable even by the dynamic linker.
Hi Christian, Thanks for looping me into this thread. On Wed, Mar 30, 2022 at 3:38 PM Cristian Rodríguez <crrodriguez@opensuse.org> wrote: > > On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha > <libc-alpha@sourceware.org> wrote: > > > Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that > > since it takes testing and there are optimization cases that we mayf > > want to make it's still best to have a seperate file. > > What is really needed is to get some sort of randomness function > exposed through the VDSO, so there is no syscall overhead and all this > magic can be dropped. I'm not even sure if that is possible. that or > a reliable prng usable even by the dynamic linker. Can someone provide some context on this? I've actually looked into having some megafast userspace RNG component in the vDSO, and I'm not fundamentally opposed to the idea. I think there's interesting potential there and something worth considering. But what's the context of you asking for this now? Under what circumstances are you finding that calling getrandom(0) or similar is too high overhead or otherwise problematic? Jason
On Thu, Mar 31, 2022 at 1:45 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Can someone provide some context on this? I've actually looked into > having some megafast userspace RNG component in the vDSO, and I'm not > fundamentally opposed to the idea. I think there's interesting > potential there and something worth considering. But what's the > context of you asking for this now? Under what circumstances are you > finding that calling getrandom(0) or similar is too high overhead or > otherwise problematic? I'm not sure in what scenario the syscall overhead is too big, Maybe it is if called in a loop..but I guess the argument here is that getrandom(0) may block or that in a non-bleeding edge system it might be too slow. (Im aware of the recent massive speedups) I believe what is needed is a PRNG, no need to guarantee to be cryptographically secure, that can be used without having to seed it from userspace, mega fast, never blocks and can be used by the dynamic linker for example. Now it will be awesome if it is all of that and CSPRNG.
On 31/03/2022 07:08, Cristian Rodríguez wrote: > On Thu, Mar 31, 2022 at 1:45 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > >> Can someone provide some context on this? I've actually looked into >> having some megafast userspace RNG component in the vDSO, and I'm not >> fundamentally opposed to the idea. I think there's interesting >> potential there and something worth considering. But what's the >> context of you asking for this now? Under what circumstances are you >> finding that calling getrandom(0) or similar is too high overhead or >> otherwise problematic? > > I'm not sure in what scenario the syscall overhead is too big, Maybe > it is if called in a loop..but I guess the argument here is that > getrandom(0) may block or that in a non-bleeding edge system it might > be too slow. (Im aware of the recent massive speedups) Current internal glibc usage where some entropy is required is not performant critical, so calling a syscall is not really a problem. However, we are evaluating tuning the mutex fastpath for PTHREAD_MUTEX_ADAPTIVE_NP, where a syscall most likely defeats the optimization. They might be other entropy usages where a userspace source would be nice thing to have, for instance on malloc hardening. > > I believe what is needed is a PRNG, no need to guarantee to be > cryptographically secure, that can be used without having to seed it > from userspace, mega fast, never blocks and can be used by the dynamic > linker for example. Now it will be awesome if it is all of that and > CSPRNG. Florian has sent a arc4random proposal somes years ago [1], the idea was pretty much what you summarized: get some entropy from kernel, initialize a PNRG, and keep a internal buffer to avoid too much overhead by the calls. It was suggested to use AES (mainly to use hardware instructions on some architecture), but I think we can use to any algorithm currently used. There are still some discussion on which is the best strategy to handle fork resets and if we need per thread buffers to optimize multithread access. An I am not sure if a vDSO would be the best interface here, it would most likely require some complicate synchronization to handle multithread access like io_ring besides limiting access if kernel provides it. I think such interface would be better to be handle by glibc itself. [1] https://sourceware.org/pipermail/libc-alpha/2018-March/092081.html
On Thu, Mar 31, 2022 at 8:17 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > There are still some discussion on which is the best strategy to handle > fork resets and if we need per thread buffers to optimize multithread > access. What about giving up on this and making the state small and use thread local storage for it ? I believe all this complexity to handle that cases is not worth it.
On 31/03/2022 08:25, Cristian Rodríguez wrote: > On Thu, Mar 31, 2022 at 8:17 AM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: > >> There are still some discussion on which is the best strategy to handle >> fork resets and if we need per thread buffers to optimize multithread >> access. > > What about giving up on this and making the state small and use thread > local storage for it ? > I believe all this complexity to handle that cases is not worth it. That what the per-thread buffer is about, we will need to define how much buffer would ideal, how to allocate (extending the TCB or using TLS), if the size would be tunable, which cipher to use, how to handle failures internally if the source of entropy is not available (Florian patch [1] used getrandom or /dev/random, and using AT_RANDOM might limit other usages like guard and stack pointer hardening). The AES implementation proposed uses a somewhat large state (about 300 bytes), which might add some overhead it would a per-thread buffer. And afaik other cyphers like Chacha20 have a even larger state (512 bytes). Florian patch tries to implement some lock-free access to the common buffer to decrease the state size, it might be a better option indeed. But if you check Florian's patch, most of the complexity come from fork detections where even with a per-thread buffer we will need to handle (for the forked thread). [1] https://sourceware.org/legacy-ml/libc-alpha/2018-05/msg00891.html
On Thu, Mar 31, 2022 at 8:48 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > The AES implementation proposed uses a somewhat large state (about 300 > bytes), which might add some overhead it would a per-thread buffer. And > afaik other cyphers like Chacha20 have a even larger state (512 bytes). Apparently chacha8 is enough for this purpose.. but I'm probably missing something.
Le 31/03/2022 à 14:14, Cristian Rodríguez via Libc-alpha a écrit : > On Thu, Mar 31, 2022 at 8:48 AM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: > >> The AES implementation proposed uses a somewhat large state (about 300 >> bytes), which might add some overhead it would a per-thread buffer. And >> afaik other cyphers like Chacha20 have a even larger state (512 bytes). > Apparently chacha8 is enough for this purpose.. but I'm probably > missing something. Chacha8 should have the same state as Chacha20 which is 64 bytes wide. Chacha8 use a a reduced round number (8) compared to Chacha20, making it faster. I've noticed for Chacha20 to be efficient on x86_64, its cost per byte is lowered by generating more than 64 bytes at a time, hence, the 512 bytes RNG state. I've generated some graph here (I'm the author of this patch): https://github.com/Parrot-Developers/libfutils/commit/0fdff8ee31f67988b68774ff04a58d0dd1d94d03 Regards.
Just so we're on the same page here, is this a discussion about optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html ? You just need a super fast random uint32_t for some future pthread changes? If so, I can send a patch to return moderately secure integers here really fast. Jason
On Thu, Mar 31, 2022 at 10:32 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Just so we're on the same page here, is this a discussion about > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html > ? > > You just need a super fast random uint32_t for some future pthread changes? On architectures with some fast non-syscall timer like x86 (`rdtsc`) will this ever be worth it? > > If so, I can send a patch to return moderately secure integers here really fast. > > Jason
On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Just so we're on the same page here, is this a discussion about > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html > ? > > You just need a super fast random uint32_t for some future pthread changes? > > If so, I can send a patch to return moderately secure integers here really fast. > > Jason That would be great if is better than the current solution.
On Thu, Mar 31, 2022 at 4:57 PM Cristian Rodríguez <crrodriguez@opensuse.org> wrote: > > On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > Just so we're on the same page here, is this a discussion about > > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html > > ? > > > > You just need a super fast random uint32_t for some future pthread changes? > > > > If so, I can send a patch to return moderately secure integers here really fast. > > > > Jason > > That would be great if is better than the current solution. Well v2 of this patch does that for x86. Just curious if I should go forward with the patchset and we can fixup the generic random-bits-entropy.h later or if the solution will be universally optimal.
Hi Cristian, On Thu, Mar 31, 2022 at 5:57 PM Cristian Rodríguez <crrodriguez@opensuse.org> wrote: > > On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > Just so we're on the same page here, is this a discussion about > > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html > > ? > > > > You just need a super fast random uint32_t for some future pthread changes? > > > > If so, I can send a patch to return moderately secure integers here really fast. > > > > Jason > > That would be great if is better than the current solution. Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c This is somewhat "secure", and maybe overkill. I can adjust this to have additional security properties (for example, preventing that counter from running backwards). Or I can adjust it to be even faster and have fewer security properties. Or maybe bench this and you'll find that it's fine. I suspect the only thing we really care about here is not leaking the AT_RANDOM value somehow, and siphash usually does a good job at that. Thoughts? Jason
On Thu, Mar 31, 2022 at 5:51 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hi Cristian, > > On Thu, Mar 31, 2022 at 5:57 PM Cristian Rodríguez > <crrodriguez@opensuse.org> wrote: > > > > On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > > > Just so we're on the same page here, is this a discussion about > > > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html > > > ? > > > > > > You just need a super fast random uint32_t for some future pthread changes? > > > > > > If so, I can send a patch to return moderately secure integers here really fast. > > > > > > Jason > > > > That would be great if is better than the current solution. > > Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c > > This is somewhat "secure", and maybe overkill. AFAIK our goal is entropy more so than security. For example if this is used to generate jiffies to stagger threads its not a security issue in any sense, it's just not ideal for performance. > > I can adjust this to have additional security properties (for example, > preventing that counter from running backwards). Or I can adjust it to > be even faster and have fewer security properties. Or maybe bench this > and you'll find that it's fine. > > I suspect the only thing we really care about here is not leaking the > AT_RANDOM value somehow, and siphash usually does a good job at that. > > Thoughts? > > Jason
Hi Noah, On Thu, Mar 31, 2022 at 7:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c > > > > This is somewhat "secure", and maybe overkill. > > AFAIK our goal is entropy more so than security. For example > if this is used to generate jiffies to stagger threads its not a security > issue in any sense, it's just not ideal for performance. Oh, okay, well just go with something linear and silly like this: https://xn--4db.cc/xrAIhCvy/c (an xorshift generator) No security, but statistically pretty good, and performs well. Jason
On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > AFAIK our goal is entropy more so than security. For example > if this is used to generate jiffies to stagger threads its not a security > issue in any sense, it's just not ideal for performance. In any case this should be more than fast enough for the other use cases of random_bits() .. maybe one new random_bits_fast() function foe edge cases where even this is too slow?
* Cristian Rodríguez: > On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha > <libc-alpha@sourceware.org> wrote: > >> Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that >> since it takes testing and there are optimization cases that we mayf >> want to make it's still best to have a seperate file. > > What is really needed is to get some sort of randomness function > exposed through the VDSO, so there is no syscall overhead and all this > magic can be dropped. I'm not even sure if that is possible. that or > a reliable prng usable even by the dynamic linker. We'd need to allocate some per-thread data, register it with the kernel (so that the kernel can apply fork protection), and pass it to the vDSO function. It's fairly involved. And we'd still need to implement fallback if we want to expose a mostly-non-failing arc4random to applications. Thanks, Florian
Hi Florian, On Mon, Apr 4, 2022 at 4:51 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Cristian Rodríguez: > > > On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha > > <libc-alpha@sourceware.org> wrote: > > > >> Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that > >> since it takes testing and there are optimization cases that we mayf > >> want to make it's still best to have a seperate file. > > > > What is really needed is to get some sort of randomness function > > exposed through the VDSO, so there is no syscall overhead and all this > > magic can be dropped. I'm not even sure if that is possible. that or > > a reliable prng usable even by the dynamic linker. > > We'd need to allocate some per-thread data, register it with the kernel > (so that the kernel can apply fork protection), and pass it to the vDSO > function. It's fairly involved. > > And we'd still need to implement fallback if we want to expose a > mostly-non-failing arc4random to applications. Indeed it's involved. But not impossible and something I'd consider working on. _However_, based on what people have said in this thread, this all seems highly unnecessary, since you just need some boring statistically uniform randomness without any crypto or security requirements of any kind, and it simply needs to be fast and dumb. If that's the wrong set of requirements for this problem (I still have no idea what the bigger picture here is), please pipe up. Assuming it's true, however, some dumb xorshift thing like https://א.cc/xrAIhCvy/c seems probably fine? Jason
* Jason A. Donenfeld via Libc-alpha: > _However_, based on what people have said in this thread, this all > seems highly unnecessary, since you just need some boring > statistically uniform randomness without any crypto or security > requirements of any kind, and it simply needs to be fast and dumb. If > that's the wrong set of requirements for this problem (I still have no > idea what the bigger picture here is), please pipe up. If we can make a cryptographically secure generator fast enough, it would relieve programmers of the need to choose between that and another generator that just gives some random-looking bits fast. If programmers don't have to make a choice, they can't choose incorrectly (introducing performance bugs or security bugs). For example, the system call overhead could be prohibitive for things like hash map keys if these hash maps are not used much before they are discarded again. Thanks, Florian
On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> wrote: > > * Jason A. Donenfeld via Libc-alpha: > > > _However_, based on what people have said in this thread, this all > > seems highly unnecessary, since you just need some boring > > statistically uniform randomness without any crypto or security > > requirements of any kind, and it simply needs to be fast and dumb. If > > that's the wrong set of requirements for this problem (I still have no > > idea what the bigger picture here is), please pipe up. > > If we can make a cryptographically secure generator fast enough, it > would relieve programmers of the need to choose between that and another > generator that just gives some random-looking bits fast. If programmers > don't have to make a choice, they can't choose incorrectly (introducing > performance bugs or security bugs). It sounds like you're talking about creating a user facing API. Since random_bits is internal do we really need so much ease of use at the cost of performance? > > For example, the system call overhead could be prohibitive for things > like hash map keys if these hash maps are not used much before they are > discarded again. > > Thanks, > Florian >
On 04/04/2022 13:51, Noah Goldstein via Libc-alpha wrote: > On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha > <libc-alpha@sourceware.org> wrote: >> >> * Jason A. Donenfeld via Libc-alpha: >> >>> _However_, based on what people have said in this thread, this all >>> seems highly unnecessary, since you just need some boring >>> statistically uniform randomness without any crypto or security >>> requirements of any kind, and it simply needs to be fast and dumb. If >>> that's the wrong set of requirements for this problem (I still have no >>> idea what the bigger picture here is), please pipe up. >> >> If we can make a cryptographically secure generator fast enough, it >> would relieve programmers of the need to choose between that and another >> generator that just gives some random-looking bits fast. If programmers >> don't have to make a choice, they can't choose incorrectly (introducing >> performance bugs or security bugs). > > It sounds like you're talking about creating a user facing API. Since > random_bits > is internal do we really need so much ease of use at the cost of performance? For a user exported ABI I think it would be better to rehearse the arc4random proposal, which I might spend some time to come with a simpler solution based on Florian proposal. For this specif proposal, I see that just factoring the code to make only x86 use the rdtsc should be suffice. Ideally, if we eventually get an arc4random the idea would be use it instead. Florian initial proposal aimed to use AES so we can leverage hardware assisted optimization presented in some chips.
On 01/04/2022 15:01, Cristian Rodríguez wrote: > On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > >> AFAIK our goal is entropy more so than security. For example >> if this is used to generate jiffies to stagger threads its not a security >> issue in any sense, it's just not ideal for performance. > > In any case this should be more than fast enough for the other use > cases of random_bits() .. maybe one new random_bits_fast() function > foe edge cases where even this is too slow? I think we are bike-shedding in the same issue OpenBSD guys stumbled and which they have solved 10 years ago [1]. Essentially, we need to come up with a internal PRNG interface that can be used internally and externally, instead of reinventing cleaver ways to use the timer as entropy source. The issue is not really the cypher used, ideally it could be replace if we find out that it does not fit. The main issue is to glue together all the requirements to have a concise internal interface, taking in consideration the glibc constrains to work with multiple kernel version and environments (where we can't assume we have access to a source or reliable entropy like getrandom syscall). My plan to rehearse Florian arc4random proposal to have some simpler to where we might improve upon (a simpler fork detection for kernels without MADV_WIPEONFORK that just issue a atfork handle, maybe using ChaCha20 as virtually all other systems do, no per-thread state). [1] https://www.youtube.com/watch?v=gp_90-3R0pE
On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 01/04/2022 15:01, Cristian Rodríguez wrote: > > On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > >> AFAIK our goal is entropy more so than security. For example > >> if this is used to generate jiffies to stagger threads its not a security > >> issue in any sense, it's just not ideal for performance. > > > > In any case this should be more than fast enough for the other use > > cases of random_bits() .. maybe one new random_bits_fast() function > > foe edge cases where even this is too slow? > > I think we are bike-shedding in the same issue OpenBSD guys stumbled > and which they have solved 10 years ago [1]. Essentially, we need to > come up with a internal PRNG interface that can be used internally and > externally, instead of reinventing cleaver ways to use the timer as > entropy source. > > The issue is not really the cypher used, ideally it could be replace if > we find out that it does not fit. The main issue is to glue together > all the requirements to have a concise internal interface, taking in > consideration the glibc constrains to work with multiple kernel version > and environments (where we can't assume we have access to a source or > reliable entropy like getrandom syscall). > > My plan to rehearse Florian arc4random proposal to have some simpler > to where we might improve upon (a simpler fork detection for kernels > without MADV_WIPEONFORK that just issue a atfork handle, maybe using > ChaCha20 as virtually all other systems do, no per-thread state). Why no per-thread / per-cpu? It seems otherwise there will need to be some explicit synchronization on the stream. Regarding this patch, do we want to skip it and just wait on arc4random interface in kernel/glibc or should I go forward with it and some arch specific entropy functions in the mean-time? > > [1] https://www.youtube.com/watch?v=gp_90-3R0pE
Hi Florian, On Mon, Apr 4, 2022 at 5:00 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Jason A. Donenfeld via Libc-alpha: > > > _However_, based on what people have said in this thread, this all > > seems highly unnecessary, since you just need some boring > > statistically uniform randomness without any crypto or security > > requirements of any kind, and it simply needs to be fast and dumb. If > > that's the wrong set of requirements for this problem (I still have no > > idea what the bigger picture here is), please pipe up. > > If we can make a cryptographically secure generator fast enough, it > would relieve programmers of the need to choose between that and another > generator that just gives some random-looking bits fast. If programmers > don't have to make a choice, they can't choose incorrectly (introducing > performance bugs or security bugs). > > For example, the system call overhead could be prohibitive for things > like hash map keys if these hash maps are not used much before they are > discarded again. Yes, if we want a general interface it's something we should do well and have it be cryptographically secure with no downsides bla bla bla. As mentioned, we can design something fancy with the vDSO if you think there's a compelling use case for this. Until your comments, though, everybody on this thread told me this was just some dinky internal interface only used for some sort of threading semantic. In that case, just name it "dinky_threading_id_thinger()" and call it a day? We are all talking about the same thing, right? Could you confirm/clarify the focus here? And if you've misunderstood what this is about, please say so too. I see the subsequent messages are now veering off into lala land since yours, so I'd like to make sure we're all on the same page, lest we waste time over nothing. Thanks, Jason
Hi Noah, On Mon, Apr 4, 2022 at 6:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha > <libc-alpha@sourceware.org> wrote: > > > > * Jason A. Donenfeld via Libc-alpha: > > > > > _However_, based on what people have said in this thread, this all > > > seems highly unnecessary, since you just need some boring > > > statistically uniform randomness without any crypto or security > > > requirements of any kind, and it simply needs to be fast and dumb. If > > > that's the wrong set of requirements for this problem (I still have no > > > idea what the bigger picture here is), please pipe up. > > > > If we can make a cryptographically secure generator fast enough, it > > would relieve programmers of the need to choose between that and another > > generator that just gives some random-looking bits fast. If programmers > > don't have to make a choice, they can't choose incorrectly (introducing > > performance bugs or security bugs). > > It sounds like you're talking about creating a user facing API. Since > random_bits > is internal do we really need so much ease of use at the cost of performance? Right, my thoughts exactly. If you just need some statistically uniform bytes for whatever it is whomever told me above something about threading something about no security something about really doesn't matter, then I fail to see why https://א.cc/xrAIhCvy/c is insufficient, and I also don't know why this thread is chasing its tail about rdtsc. I'd appreciate it if somebody in a leadership position could let me know what is asked of me here, since somebody CC'd me into this thread. I've already made two RNGs here for various spitballed objectives. I'd like to refrain from spending time on a third until there's some clarity on what it is you all want. Probably it makes sense for me to leave this thread alone for a bit while you guys work that out. Jason
On 04/04/2022 15:23, Noah Goldstein wrote: > On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 01/04/2022 15:01, Cristian Rodríguez wrote: >>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: >>> >>>> AFAIK our goal is entropy more so than security. For example >>>> if this is used to generate jiffies to stagger threads its not a security >>>> issue in any sense, it's just not ideal for performance. >>> >>> In any case this should be more than fast enough for the other use >>> cases of random_bits() .. maybe one new random_bits_fast() function >>> foe edge cases where even this is too slow? >> >> I think we are bike-shedding in the same issue OpenBSD guys stumbled >> and which they have solved 10 years ago [1]. Essentially, we need to >> come up with a internal PRNG interface that can be used internally and >> externally, instead of reinventing cleaver ways to use the timer as >> entropy source. >> >> The issue is not really the cypher used, ideally it could be replace if >> we find out that it does not fit. The main issue is to glue together >> all the requirements to have a concise internal interface, taking in >> consideration the glibc constrains to work with multiple kernel version >> and environments (where we can't assume we have access to a source or >> reliable entropy like getrandom syscall). >> >> My plan to rehearse Florian arc4random proposal to have some simpler >> to where we might improve upon (a simpler fork detection for kernels >> without MADV_WIPEONFORK that just issue a atfork handle, maybe using >> ChaCha20 as virtually all other systems do, no per-thread state). > > Why no per-thread / per-cpu? It seems otherwise there will need to be > some explicit synchronization on the stream. Mainly because it simplifies a lot the *initial* implementation. I would prefer to incremental add per-thread optimization than dump a large patch so we can review in integrate the code more easier. > > > Regarding this patch, do we want to skip it and just wait on arc4random > interface in kernel/glibc or should I go forward with it and some arch > specific entropy functions in the mean-time? I don't have a strong opinion on this patch, it does improve x86_64 latency on random_bits although currently internal usage are far from latency sensitive. It is really a microoptimization without much real work gain for current code. > >> >> [1] https://www.youtube.com/watch?v=gp_90-3R0pE
On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 04/04/2022 15:23, Noah Goldstein wrote: > > On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella > > <adhemerval.zanella@linaro.org> wrote: > >> > >> > >> > >> On 01/04/2022 15:01, Cristian Rodríguez wrote: > >>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > >>> > >>>> AFAIK our goal is entropy more so than security. For example > >>>> if this is used to generate jiffies to stagger threads its not a security > >>>> issue in any sense, it's just not ideal for performance. > >>> > >>> In any case this should be more than fast enough for the other use > >>> cases of random_bits() .. maybe one new random_bits_fast() function > >>> foe edge cases where even this is too slow? > >> > >> I think we are bike-shedding in the same issue OpenBSD guys stumbled > >> and which they have solved 10 years ago [1]. Essentially, we need to > >> come up with a internal PRNG interface that can be used internally and > >> externally, instead of reinventing cleaver ways to use the timer as > >> entropy source. > >> > >> The issue is not really the cypher used, ideally it could be replace if > >> we find out that it does not fit. The main issue is to glue together > >> all the requirements to have a concise internal interface, taking in > >> consideration the glibc constrains to work with multiple kernel version > >> and environments (where we can't assume we have access to a source or > >> reliable entropy like getrandom syscall). > >> > >> My plan to rehearse Florian arc4random proposal to have some simpler > >> to where we might improve upon (a simpler fork detection for kernels > >> without MADV_WIPEONFORK that just issue a atfork handle, maybe using > >> ChaCha20 as virtually all other systems do, no per-thread state). > > > > Why no per-thread / per-cpu? It seems otherwise there will need to be > > some explicit synchronization on the stream. > > Mainly because it simplifies a lot the *initial* implementation. I would > prefer to incremental add per-thread optimization than dump a large patch > so we can review in integrate the code more easier. > > > > > > > Regarding this patch, do we want to skip it and just wait on arc4random > > interface in kernel/glibc or should I go forward with it and some arch > > specific entropy functions in the mean-time? > > I don't have a strong opinion on this patch, it does improve x86_64 > latency on random_bits although currently internal usage are far from > latency sensitive. It is really a microoptimization without much real > work gain for current code. That's fair. The motivation is so random_bits can be used for lightweight jitter i.e in cases like: [v2] nptl: Add backoff mechanism to spinlock loop The x86 backend with `rdtsc` is just because it's a simple improvement, other arch will hopefully be able to get off syscall if they don't have vdso gettime. But agree the improvement gains a minimal so if people don't feel its worth the added complexity we can wait on a strong arch-random interface. > > > > >> > >> [1] https://www.youtube.com/watch?v=gp_90-3R0pE
On Mon, Apr 4, 2022 at 1:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hi Noah, > > On Mon, Apr 4, 2022 at 6:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha > > <libc-alpha@sourceware.org> wrote: > > > > > > * Jason A. Donenfeld via Libc-alpha: > > > > > > > _However_, based on what people have said in this thread, this all > > > > seems highly unnecessary, since you just need some boring > > > > statistically uniform randomness without any crypto or security > > > > requirements of any kind, and it simply needs to be fast and dumb. If > > > > that's the wrong set of requirements for this problem (I still have no > > > > idea what the bigger picture here is), please pipe up. > > > > > > If we can make a cryptographically secure generator fast enough, it > > > would relieve programmers of the need to choose between that and another > > > generator that just gives some random-looking bits fast. If programmers > > > don't have to make a choice, they can't choose incorrectly (introducing > > > performance bugs or security bugs). > > > > It sounds like you're talking about creating a user facing API. Since > > random_bits > > is internal do we really need so much ease of use at the cost of performance? > > Right, my thoughts exactly. If you just need some statistically > uniform bytes for whatever it is whomever told me above something > about threading something about no security something about really > doesn't matter, then I fail to see why https://א.cc/xrAIhCvy/c is > insufficient, and I also don't know why this thread is chasing its > tail about rdtsc. > > I'd appreciate it if somebody in a leadership position could let me > know what is asked of me here, since somebody CC'd me into this > thread. I've already made two RNGs here for various spitballed > objectives. I'd like to refrain from spending time on a third until > there's some clarity on what it is you all want. Probably it makes > sense for me to leave this thread alone for a bit while you guys work > that out. > Sorry it seems this thread has diverged in several directions. Think a seperate thread for the arc4random stuff would be better as its really its own thing. Regarding this patch which aims at making it easier to provide a lighter-weight random_bits function. Your code seems to make sense for systems without VDSO or some arch alternative that can be used to avoid syscalls. Generally my thought it saving the static storage (TLS really) is worth it so I'd prefer to keep the per-arch interface and make your code the generic solution but thats up for debate. Particularly arch specific is harder to maintain and this function is not particularly hot. So I see four possibilities: 1. Drop it. 2. Generic solution only (using code based on Jason's solution) 3. This patch and update generic solution with Jason's solution 4. This patch and keep gettime for the generic solution Anyone have opinions of 1/2/3/4? If not I'm partial to 3. > Jason
On 04/04/2022 15:52, Noah Goldstein wrote: > On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 04/04/2022 15:23, Noah Goldstein wrote: >>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella >>> <adhemerval.zanella@linaro.org> wrote: >>>> >>>> >>>> >>>> On 01/04/2022 15:01, Cristian Rodríguez wrote: >>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: >>>>> >>>>>> AFAIK our goal is entropy more so than security. For example >>>>>> if this is used to generate jiffies to stagger threads its not a security >>>>>> issue in any sense, it's just not ideal for performance. >>>>> >>>>> In any case this should be more than fast enough for the other use >>>>> cases of random_bits() .. maybe one new random_bits_fast() function >>>>> foe edge cases where even this is too slow? >>>> >>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled >>>> and which they have solved 10 years ago [1]. Essentially, we need to >>>> come up with a internal PRNG interface that can be used internally and >>>> externally, instead of reinventing cleaver ways to use the timer as >>>> entropy source. >>>> >>>> The issue is not really the cypher used, ideally it could be replace if >>>> we find out that it does not fit. The main issue is to glue together >>>> all the requirements to have a concise internal interface, taking in >>>> consideration the glibc constrains to work with multiple kernel version >>>> and environments (where we can't assume we have access to a source or >>>> reliable entropy like getrandom syscall). >>>> >>>> My plan to rehearse Florian arc4random proposal to have some simpler >>>> to where we might improve upon (a simpler fork detection for kernels >>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using >>>> ChaCha20 as virtually all other systems do, no per-thread state). >>> >>> Why no per-thread / per-cpu? It seems otherwise there will need to be >>> some explicit synchronization on the stream. >> >> Mainly because it simplifies a lot the *initial* implementation. I would >> prefer to incremental add per-thread optimization than dump a large patch >> so we can review in integrate the code more easier. >> >>> >>> >>> Regarding this patch, do we want to skip it and just wait on arc4random >>> interface in kernel/glibc or should I go forward with it and some arch >>> specific entropy functions in the mean-time? >> >> I don't have a strong opinion on this patch, it does improve x86_64 >> latency on random_bits although currently internal usage are far from >> latency sensitive. It is really a microoptimization without much real >> work gain for current code. > > That's fair. The motivation is so random_bits can be used for lightweight > jitter i.e in cases like: > > [v2] nptl: Add backoff mechanism to spinlock loop > > The x86 backend with `rdtsc` is just because it's a simple improvement, > other arch will hopefully be able to get off syscall if they don't have vdso > gettime. > But agree the improvement gains a minimal so if people don't feel its worth > the added complexity we can wait on a strong arch-random interface. I think for mutex optimization it would be better to just add a arch-specific jitter code and use the backoff optimization iff the arch-specific code is used. And maybe not tied to random_bits(), since I am not sure if an interface like arc4random would be good to use in such scenario (since a possible state reschedule might call getentropy with my add unexpected latency).
On Mon, Apr 4, 2022 at 2:20 PM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > > On 04/04/2022 15:52, Noah Goldstein wrote: > > On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella > > <adhemerval.zanella@linaro.org> wrote: > >> > >> > >> > >> On 04/04/2022 15:23, Noah Goldstein wrote: > >>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella > >>> <adhemerval.zanella@linaro.org> wrote: > >>>> > >>>> > >>>> > >>>> On 01/04/2022 15:01, Cristian Rodríguez wrote: > >>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > >>>>> > >>>>>> AFAIK our goal is entropy more so than security. For example > >>>>>> if this is used to generate jiffies to stagger threads its not a security > >>>>>> issue in any sense, it's just not ideal for performance. > >>>>> > >>>>> In any case this should be more than fast enough for the other use > >>>>> cases of random_bits() .. maybe one new random_bits_fast() function > >>>>> foe edge cases where even this is too slow? > >>>> > >>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled > >>>> and which they have solved 10 years ago [1]. Essentially, we need to > >>>> come up with a internal PRNG interface that can be used internally and > >>>> externally, instead of reinventing cleaver ways to use the timer as > >>>> entropy source. > >>>> > >>>> The issue is not really the cypher used, ideally it could be replace if > >>>> we find out that it does not fit. The main issue is to glue together > >>>> all the requirements to have a concise internal interface, taking in > >>>> consideration the glibc constrains to work with multiple kernel version > >>>> and environments (where we can't assume we have access to a source or > >>>> reliable entropy like getrandom syscall). > >>>> > >>>> My plan to rehearse Florian arc4random proposal to have some simpler > >>>> to where we might improve upon (a simpler fork detection for kernels > >>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using > >>>> ChaCha20 as virtually all other systems do, no per-thread state). > >>> > >>> Why no per-thread / per-cpu? It seems otherwise there will need to be > >>> some explicit synchronization on the stream. > >> > >> Mainly because it simplifies a lot the *initial* implementation. I would > >> prefer to incremental add per-thread optimization than dump a large patch > >> so we can review in integrate the code more easier. > >> > >>> > >>> > >>> Regarding this patch, do we want to skip it and just wait on arc4random > >>> interface in kernel/glibc or should I go forward with it and some arch > >>> specific entropy functions in the mean-time? > >> > >> I don't have a strong opinion on this patch, it does improve x86_64 > >> latency on random_bits although currently internal usage are far from > >> latency sensitive. It is really a microoptimization without much real > >> work gain for current code. > > > > That's fair. The motivation is so random_bits can be used for lightweight > > jitter i.e in cases like: > > > > [v2] nptl: Add backoff mechanism to spinlock loop > > > > The x86 backend with `rdtsc` is just because it's a simple improvement, > > other arch will hopefully be able to get off syscall if they don't have vdso > > gettime. > > But agree the improvement gains a minimal so if people don't feel its worth > > the added complexity we can wait on a strong arch-random interface. > > I think for mutex optimization it would be better to just add a arch-specific > jitter code and use the backoff optimization iff the arch-specific code is > used. And maybe not tied to random_bits(), since I am not sure if an interface > like arc4random would be good to use in such scenario (since a possible state > reschedule might call getentropy with my add unexpected latency). So you propose drop this patch in favor of some new internal interface like get_fast_jitter() and a define such as 'HAS_FAST_JITTER'?
On 04/04/2022 16:48, Noah Goldstein wrote: > On Mon, Apr 4, 2022 at 2:20 PM Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 04/04/2022 15:52, Noah Goldstein wrote: >>> On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella >>> <adhemerval.zanella@linaro.org> wrote: >>>> >>>> >>>> >>>> On 04/04/2022 15:23, Noah Goldstein wrote: >>>>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella >>>>> <adhemerval.zanella@linaro.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 01/04/2022 15:01, Cristian Rodríguez wrote: >>>>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: >>>>>>> >>>>>>>> AFAIK our goal is entropy more so than security. For example >>>>>>>> if this is used to generate jiffies to stagger threads its not a security >>>>>>>> issue in any sense, it's just not ideal for performance. >>>>>>> >>>>>>> In any case this should be more than fast enough for the other use >>>>>>> cases of random_bits() .. maybe one new random_bits_fast() function >>>>>>> foe edge cases where even this is too slow? >>>>>> >>>>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled >>>>>> and which they have solved 10 years ago [1]. Essentially, we need to >>>>>> come up with a internal PRNG interface that can be used internally and >>>>>> externally, instead of reinventing cleaver ways to use the timer as >>>>>> entropy source. >>>>>> >>>>>> The issue is not really the cypher used, ideally it could be replace if >>>>>> we find out that it does not fit. The main issue is to glue together >>>>>> all the requirements to have a concise internal interface, taking in >>>>>> consideration the glibc constrains to work with multiple kernel version >>>>>> and environments (where we can't assume we have access to a source or >>>>>> reliable entropy like getrandom syscall). >>>>>> >>>>>> My plan to rehearse Florian arc4random proposal to have some simpler >>>>>> to where we might improve upon (a simpler fork detection for kernels >>>>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using >>>>>> ChaCha20 as virtually all other systems do, no per-thread state). >>>>> >>>>> Why no per-thread / per-cpu? It seems otherwise there will need to be >>>>> some explicit synchronization on the stream. >>>> >>>> Mainly because it simplifies a lot the *initial* implementation. I would >>>> prefer to incremental add per-thread optimization than dump a large patch >>>> so we can review in integrate the code more easier. >>>> >>>>> >>>>> >>>>> Regarding this patch, do we want to skip it and just wait on arc4random >>>>> interface in kernel/glibc or should I go forward with it and some arch >>>>> specific entropy functions in the mean-time? >>>> >>>> I don't have a strong opinion on this patch, it does improve x86_64 >>>> latency on random_bits although currently internal usage are far from >>>> latency sensitive. It is really a microoptimization without much real >>>> work gain for current code. >>> >>> That's fair. The motivation is so random_bits can be used for lightweight >>> jitter i.e in cases like: >>> >>> [v2] nptl: Add backoff mechanism to spinlock loop >>> >>> The x86 backend with `rdtsc` is just because it's a simple improvement, >>> other arch will hopefully be able to get off syscall if they don't have vdso >>> gettime. >>> But agree the improvement gains a minimal so if people don't feel its worth >>> the added complexity we can wait on a strong arch-random interface. >> >> I think for mutex optimization it would be better to just add a arch-specific >> jitter code and use the backoff optimization iff the arch-specific code is >> used. And maybe not tied to random_bits(), since I am not sure if an interface >> like arc4random would be good to use in such scenario (since a possible state >> reschedule might call getentropy with my add unexpected latency). > > So you propose drop this patch in favor of some new internal interface like > get_fast_jitter() and a define such as 'HAS_FAST_JITTER'? Yeah, this would be better than try to improve random_bits. Also, recently we are trying to avoid internal HAS/HAVE defines, it is better to add a generic interface that does either nothing or return a default value (so default get_fast_jitter() would return 1 or 0, so the mutex code will se it just do the current spinning without backoff).
On Mon, Apr 4, 2022 at 2:32 PM Jason A. Donenfeld via Libc-alpha <libc-alpha@sourceware.org> wrote: >re, since somebody CC'd me into this > thread. I've already made two RNGs here for various spitballed > objectives. I added you to this conversation because I wanted to hear the opinion from someone with good knowledge on the subject. in particular to evaluate the possibility of adding a better, fast PRNG for non-crypto purposes for internal use of the C library, since the existing code isn't quite "random", This revived once again the topic of introducing BSD's arc4random API to glibc, a subject that has been circulating for quite some time,
Hi Cristian, On Tue, Apr 5, 2022 at 2:10 AM Cristian Rodríguez <crrodriguez@opensuse.org> wrote: > I added you to this conversation because I wanted to hear the opinion > from someone with good knowledge on the subject. in particular to > evaluate the possibility of adding a better, fast PRNG for non-crypto > purposes for internal use of the C library, since the existing code > isn't quite "random", > > This revived once again the topic of introducing BSD's arc4random API > to glibc, a subject that has been circulating for quite some time, Ahh, okay, thanks for clarifying. So the arc4random stuff isn't really part of this discussion, but the faster non-crypto prng thing is. In that case, just go with https://א.cc/xrAIhCvy/c and be done with it? Any objections to that approach? Jason
* Jason A. Donenfeld: >> > _However_, based on what people have said in this thread, this all >> > seems highly unnecessary, since you just need some boring >> > statistically uniform randomness without any crypto or security >> > requirements of any kind, and it simply needs to be fast and dumb. If >> > that's the wrong set of requirements for this problem (I still have no >> > idea what the bigger picture here is), please pipe up. >> >> If we can make a cryptographically secure generator fast enough, it >> would relieve programmers of the need to choose between that and another >> generator that just gives some random-looking bits fast. If programmers >> don't have to make a choice, they can't choose incorrectly (introducing >> performance bugs or security bugs). >> >> For example, the system call overhead could be prohibitive for things >> like hash map keys if these hash maps are not used much before they are >> discarded again. > > Yes, if we want a general interface it's something we should do well > and have it be cryptographically secure with no downsides bla bla bla. > As mentioned, we can design something fancy with the vDSO if you think > there's a compelling use case for this. > > Until your comments, though, everybody on this thread told me this was > just some dinky internal interface only used for some sort of > threading semantic. In that case, just name it > "dinky_threading_id_thinger()" and call it a day? We are all talking > about the same thing, right? Could you confirm/clarify the focus here? > And if you've misunderstood what this is about, please say so too. I > see the subsequent messages are now veering off into lala land since > yours, so I'd like to make sure we're all on the same page, lest we > waste time over nothing. We have multiple internal users of random_bits in the glibc source tree. Some of these users should switch to something stronger because they are used for security hardening. So practical unpredictability matters, although we do not need an approved generator. Thanks, Florian
* Noah Goldstein: > On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha > <libc-alpha@sourceware.org> wrote: >> >> * Jason A. Donenfeld via Libc-alpha: >> >> > _However_, based on what people have said in this thread, this all >> > seems highly unnecessary, since you just need some boring >> > statistically uniform randomness without any crypto or security >> > requirements of any kind, and it simply needs to be fast and dumb. If >> > that's the wrong set of requirements for this problem (I still have no >> > idea what the bigger picture here is), please pipe up. >> >> If we can make a cryptographically secure generator fast enough, it >> would relieve programmers of the need to choose between that and another >> generator that just gives some random-looking bits fast. If programmers >> don't have to make a choice, they can't choose incorrectly (introducing >> performance bugs or security bugs). > > It sounds like you're talking about creating a user facing API. Since > random_bits is internal do we really need so much ease of use at the > cost of performance? We already have uses of random_bits which are somewhat dodgy, for example in __res_context_mkquery. The mutex use case, where speed is strictly more important than unpredictability, is probably an outlier. Thanks, Florian
On Mon, Apr 4, 2022 at 8:18 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Any objections to that approach?
I personally would prefer the siphash-based approach.. which is
probably the best approach for most cases not on a hot codepath..
diff --git a/include/random-bits.h b/include/random-bits.h index 17665b479a..016b87576c 100644 --- a/include/random-bits.h +++ b/include/random-bits.h @@ -19,21 +19,17 @@ #ifndef _RANDOM_BITS_H # define _RANDOM_BITS_H -#include <time.h> -#include <stdint.h> +# include <random-bits-entropy.h> +# include <stdint.h> -/* Provides fast pseudo-random bits through clock_gettime. It has unspecified - starting time, nano-second accuracy, its randomness is significantly better - than gettimeofday, and for mostly architectures it is implemented through - vDSO instead of a syscall. Since the source is a system clock, the upper - bits will have less entropy. */ +/* Provides fast pseudo-random bits through architecture specific + random_bits_entropy. Expectation is source is some timing function so + the upper bits have less entropy. */ static inline uint32_t random_bits (void) { - struct __timespec64 tv; - __clock_gettime64 (CLOCK_MONOTONIC, &tv); + uint32_t ret = random_bits_entropy (); /* Shuffle the lower bits to minimize the clock bias. */ - uint32_t ret = tv.tv_nsec ^ tv.tv_sec; ret ^= (ret << 24) | (ret >> 8); return ret; } diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h new file mode 100644 index 0000000000..53290c7f7a --- /dev/null +++ b/sysdeps/generic/random-bits-entropy.h @@ -0,0 +1,31 @@ +/* Fast function for generating entropy of random_bits. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <stdint.h> +#include <time.h> + +/* Generically use clock_gettime. It has unspecified starting time, nano-second + accuracy, its randomness is significantly better than gettimeofday, and for + mostly architectures it is implemented through vDSO instead of a syscall. */ +static inline uint32_t +random_bits_entropy (void) +{ + struct __timespec64 tv; + __clock_gettime64 (CLOCK_MONOTONIC, &tv); + return tv.tv_nsec ^ tv.tv_sec; +}