Message ID | 20210507081547.6945-1-andrea.righi@canonical.com |
---|---|
Headers | show |
Series | AWS: fix out of entropy on Graviton 2 instances types (mg6.*) | expand |
On Fri, May 7, 2021 at 5:16 AM Andrea Righi <andrea.righi@canonical.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/1927692 > > [Impact] > > AWS Graviton 2 instances do not have enough entropy available at boot, > so any task that require entropy (even reading few bytes from > /dev/random) will be stuck forever. > > [Fix] > > The proper fix for this problem is to correctly refill the entropy pool > with some real random data using some hardware-generated randomness. > > In the meantime a reasonable workaround can be to apply the following > upstream commits: > > 30c08efec888 random: make /dev/random be almost like /dev/urandom > 48446f198f9a random: ignore GRND_RANDOM in getentropy(2) > 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes > c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn > 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1 > > In this way the system will not run out of entropy and will be able to > provide best-effort randomness in any case, preventing the out of > entropy issue on the AWS Gravion 2 instances. > > [Test plan] > > Execute the following command on any m6g instance: > > dd bs=32 count=1 if=/dev/random of=/dev/null > > This should return quickly, if not it means that the system does not > have enough entropy available. When the problem happens this command > hangs forever. > > [Where problems could occur] > > This changes affect the read semantics of /dev/random to be the same as > /dev/urandom except that reads will block until the CRNG is ready. This > should not materially break any API. Any code that worked without these > changes should work at least as well as before. However, applications > that have strict randomness requirements might be affected by the > provided best-effort randomness, so we may need to apply more > commits/changes to introduce a proper hardware entropy support on > Graviton 2 instances to provide a better quality of randomness. In the > meantime these upstream changes consist a reasonable workaround to > prevent applications from hanging forever on the mg6.* instances. > > ---------------------------------------------------------------- > Andy Lutomirski (5): > random: add GRND_INSECURE to return best-effort non-cryptographic bytes > random: Don't wake crng_init_wait when crng_init == 1 > random: Add a urandom_read_nowait() for random APIs that don't warn > random: ignore GRND_RANDOM in getentropy(2) > random: make /dev/random be almost like /dev/urandom > > drivers/char/random.c | 81 +++++++++++++++++++++++++++++++++------------------------------------------------ > include/uapi/linux/random.h | 4 +++- > 2 files changed, 36 insertions(+), 49 deletions(-) > > Thanks Andrea, LGTM. I wonder if we plan to apply these commits to all 5.4-based kernels - are they in 5.8? If so, I feel it is worth to add them to all 5.4-based kernels, entropy blocking is a PITA and usually lead to multiple complains due to boot problems. I understand though that this is more urgent to AWS...so mandatory to apply in F/AWS! That said: Acked-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com> I'm not sure I fully understand patch 5, but it is a clean cherry-pick and testing shows it to at least not block anymore. As for how random the information is that is returned I can't say. On 5/7/21 2:15 AM, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1927692 > > [Impact] > > AWS Graviton 2 instances do not have enough entropy available at boot, > so any task that require entropy (even reading few bytes from > /dev/random) will be stuck forever. > > [Fix] > > The proper fix for this problem is to correctly refill the entropy pool > with some real random data using some hardware-generated randomness. > > In the meantime a reasonable workaround can be to apply the following > upstream commits: > > 30c08efec888 random: make /dev/random be almost like /dev/urandom > 48446f198f9a random: ignore GRND_RANDOM in getentropy(2) > 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes > c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn > 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1 > > In this way the system will not run out of entropy and will be able to > provide best-effort randomness in any case, preventing the out of > entropy issue on the AWS Gravion 2 instances. > > [Test plan] > > Execute the following command on any m6g instance: > > dd bs=32 count=1 if=/dev/random of=/dev/null > > This should return quickly, if not it means that the system does not > have enough entropy available. When the problem happens this command > hangs forever. > > [Where problems could occur] > > This changes affect the read semantics of /dev/random to be the same as > /dev/urandom except that reads will block until the CRNG is ready. This > should not materially break any API. Any code that worked without these > changes should work at least as well as before. However, applications > that have strict randomness requirements might be affected by the > provided best-effort randomness, so we may need to apply more > commits/changes to introduce a proper hardware entropy support on > Graviton 2 instances to provide a better quality of randomness. In the > meantime these upstream changes consist a reasonable workaround to > prevent applications from hanging forever on the mg6.* instances. > > ---------------------------------------------------------------- > Andy Lutomirski (5): > random: add GRND_INSECURE to return best-effort non-cryptographic bytes > random: Don't wake crng_init_wait when crng_init == 1 > random: Add a urandom_read_nowait() for random APIs that don't warn > random: ignore GRND_RANDOM in getentropy(2) > random: make /dev/random be almost like /dev/urandom > > drivers/char/random.c | 81 +++++++++++++++++++++++++++++++++------------------------------------------------ > include/uapi/linux/random.h | 4 +++- > 2 files changed, 36 insertions(+), 49 deletions(-) > >
On Fri, May 07, 2021 at 07:58:09AM -0300, Guilherme Piccoli wrote: > On Fri, May 7, 2021 at 5:16 AM Andrea Righi <andrea.righi@canonical.com> wrote: > > > > BugLink: https://bugs.launchpad.net/bugs/1927692 > > > > [Impact] > > > > AWS Graviton 2 instances do not have enough entropy available at boot, > > so any task that require entropy (even reading few bytes from > > /dev/random) will be stuck forever. > > > > [Fix] > > > > The proper fix for this problem is to correctly refill the entropy pool > > with some real random data using some hardware-generated randomness. > > > > In the meantime a reasonable workaround can be to apply the following > > upstream commits: > > > > 30c08efec888 random: make /dev/random be almost like /dev/urandom > > 48446f198f9a random: ignore GRND_RANDOM in getentropy(2) > > 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes > > c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn > > 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1 > > > > In this way the system will not run out of entropy and will be able to > > provide best-effort randomness in any case, preventing the out of > > entropy issue on the AWS Gravion 2 instances. > > > > [Test plan] > > > > Execute the following command on any m6g instance: > > > > dd bs=32 count=1 if=/dev/random of=/dev/null > > > > This should return quickly, if not it means that the system does not > > have enough entropy available. When the problem happens this command > > hangs forever. > > > > [Where problems could occur] > > > > This changes affect the read semantics of /dev/random to be the same as > > /dev/urandom except that reads will block until the CRNG is ready. This > > should not materially break any API. Any code that worked without these > > changes should work at least as well as before. However, applications > > that have strict randomness requirements might be affected by the > > provided best-effort randomness, so we may need to apply more > > commits/changes to introduce a proper hardware entropy support on > > Graviton 2 instances to provide a better quality of randomness. In the > > meantime these upstream changes consist a reasonable workaround to > > prevent applications from hanging forever on the mg6.* instances. > > > > ---------------------------------------------------------------- > > Andy Lutomirski (5): > > random: add GRND_INSECURE to return best-effort non-cryptographic bytes > > random: Don't wake crng_init_wait when crng_init == 1 > > random: Add a urandom_read_nowait() for random APIs that don't warn > > random: ignore GRND_RANDOM in getentropy(2) > > random: make /dev/random be almost like /dev/urandom > > > > drivers/char/random.c | 81 +++++++++++++++++++++++++++++++++------------------------------------------------ > > include/uapi/linux/random.h | 4 +++- > > 2 files changed, 36 insertions(+), 49 deletions(-) > > > > > > Thanks Andrea, LGTM. I wonder if we plan to apply these commits to all > 5.4-based kernels - are they in 5.8? If so, I feel it is worth to add > them to all 5.4-based kernels, entropy blocking is a PITA and usually > lead to multiple complains due to boot problems. I understand though > that this is more urgent to AWS...so mandatory to apply in F/AWS! > That said: > > Acked-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Thanks for the review Guilherme. These commits are all applied to all our kernels >= 5.8 already and I agree that this patch set should probably target all 5.4 kernels (especially the cloud kernels that can easily go out of entropy). However, I would do more tests and more investigation before applying it across the board, since it doesn't seem to be a blocker for the other kernels. -Andrea
On Fri, May 07, 2021 at 05:31:09AM -0600, Tim Gardner wrote: > Acked-by: Tim Gardner <tim.gardner@canonical.com> > > I'm not sure I fully understand patch 5, but it is a clean cherry-pick and > testing shows it to at least not block anymore. As for how random the > information is that is returned I can't say. Thanks for the review, Tim. Patch 5 changes the read semantic of /dev/random. Before, the kernel was using two separate pools of random data: one for /dev/random and another for /dev/urandom. The pool for /dev/random was a blocking pool (reads blocked until enogh entropy is available) filled with "real" random data. After the change the blocking pool is not used anymore by /dev/random reads, reads will only block until the CRNG (cryptographic random-number-generator has been initialized - function crng_ready()). Once the CRNG is initialized all reads from /dev/random will never block and will consume data generated by the CRNG and real random events. Basically after the change the kernel trusts the numbers generated by the CRNG and before we were trusting only numbers generated by truly random events. This change is covered very well in this article: https://lwn.net/Articles/808575/ -Andrea > > On 5/7/21 2:15 AM, Andrea Righi wrote: > > BugLink: https://bugs.launchpad.net/bugs/1927692 > > > > [Impact] > > > > AWS Graviton 2 instances do not have enough entropy available at boot, > > so any task that require entropy (even reading few bytes from > > /dev/random) will be stuck forever. > > > > [Fix] > > > > The proper fix for this problem is to correctly refill the entropy pool > > with some real random data using some hardware-generated randomness. > > > > In the meantime a reasonable workaround can be to apply the following > > upstream commits: > > > > 30c08efec888 random: make /dev/random be almost like /dev/urandom > > 48446f198f9a random: ignore GRND_RANDOM in getentropy(2) > > 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes > > c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn > > 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1 > > > > In this way the system will not run out of entropy and will be able to > > provide best-effort randomness in any case, preventing the out of > > entropy issue on the AWS Gravion 2 instances. > > > > [Test plan] > > > > Execute the following command on any m6g instance: > > > > dd bs=32 count=1 if=/dev/random of=/dev/null > > > > This should return quickly, if not it means that the system does not > > have enough entropy available. When the problem happens this command > > hangs forever. > > > > [Where problems could occur] > > > > This changes affect the read semantics of /dev/random to be the same as > > /dev/urandom except that reads will block until the CRNG is ready. This > > should not materially break any API. Any code that worked without these > > changes should work at least as well as before. However, applications > > that have strict randomness requirements might be affected by the > > provided best-effort randomness, so we may need to apply more > > commits/changes to introduce a proper hardware entropy support on > > Graviton 2 instances to provide a better quality of randomness. In the > > meantime these upstream changes consist a reasonable workaround to > > prevent applications from hanging forever on the mg6.* instances. > > > > ---------------------------------------------------------------- > > Andy Lutomirski (5): > > random: add GRND_INSECURE to return best-effort non-cryptographic bytes > > random: Don't wake crng_init_wait when crng_init == 1 > > random: Add a urandom_read_nowait() for random APIs that don't warn > > random: ignore GRND_RANDOM in getentropy(2) > > random: make /dev/random be almost like /dev/urandom > > > > drivers/char/random.c | 81 +++++++++++++++++++++++++++++++++------------------------------------------------ > > include/uapi/linux/random.h | 4 +++- > > 2 files changed, 36 insertions(+), 49 deletions(-) > > > > > > -- > ----------- > Tim Gardner > Canonical, Inc
On Fri, May 7, 2021 at 10:03 AM Andrea Righi <andrea.righi@canonical.com> wrote: > Thanks for the review Guilherme. These commits are all applied to all > our kernels >= 5.8 already and I agree that this patch set should > probably target all 5.4 kernels (especially the cloud kernels that can > easily go out of entropy). However, I would do more tests and more > investigation before applying it across the board, since it doesn't seem > to be a blocker for the other kernels. > > -Andrea Makes sense, thank you Andrea =)
On 5/7/21 7:26 AM, Andrea Righi wrote: > On Fri, May 07, 2021 at 05:31:09AM -0600, Tim Gardner wrote: >> Acked-by: Tim Gardner <tim.gardner@canonical.com> >> >> I'm not sure I fully understand patch 5, but it is a clean cherry-pick and >> testing shows it to at least not block anymore. As for how random the >> information is that is returned I can't say. > > Thanks for the review, Tim. > > Patch 5 changes the read semantic of /dev/random. > > Before, the kernel was using two separate pools of random data: one for > /dev/random and another for /dev/urandom. The pool for > /dev/random was a blocking pool (reads blocked until enogh entropy is > available) filled with "real" random data. > > After the change the blocking pool is not used anymore by /dev/random > reads, reads will only block until the CRNG (cryptographic > random-number-generator has been initialized - function crng_ready()). > Once the CRNG is initialized all reads from /dev/random will never > block and will consume data generated by the CRNG and real random > events. > > Basically after the change the kernel trusts the numbers generated by > the CRNG and before we were trusting only numbers generated by truly > random events. > > This change is covered very well in this article: > https://lwn.net/Articles/808575/ > > -Andrea Thanks for the pointer. That was quite informative. rtg ----------- Tim Gardner Canonical, Inc
Applied to focal:linux-aws master. Thanks. -rtg On 5/7/21 2:15 AM, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1927692 > > [Impact] > > AWS Graviton 2 instances do not have enough entropy available at boot, > so any task that require entropy (even reading few bytes from > /dev/random) will be stuck forever. > > [Fix] > > The proper fix for this problem is to correctly refill the entropy pool > with some real random data using some hardware-generated randomness. > > In the meantime a reasonable workaround can be to apply the following > upstream commits: > > 30c08efec888 random: make /dev/random be almost like /dev/urandom > 48446f198f9a random: ignore GRND_RANDOM in getentropy(2) > 75551dbf112c random: add GRND_INSECURE to return best-effort non-cryptographic bytes > c6f1deb15878 random: Add a urandom_read_nowait() for random APIs that don't warn > 4c8d062186d9 random: Don't wake crng_init_wait when crng_init == 1 > > In this way the system will not run out of entropy and will be able to > provide best-effort randomness in any case, preventing the out of > entropy issue on the AWS Gravion 2 instances. > > [Test plan] > > Execute the following command on any m6g instance: > > dd bs=32 count=1 if=/dev/random of=/dev/null > > This should return quickly, if not it means that the system does not > have enough entropy available. When the problem happens this command > hangs forever. > > [Where problems could occur] > > This changes affect the read semantics of /dev/random to be the same as > /dev/urandom except that reads will block until the CRNG is ready. This > should not materially break any API. Any code that worked without these > changes should work at least as well as before. However, applications > that have strict randomness requirements might be affected by the > provided best-effort randomness, so we may need to apply more > commits/changes to introduce a proper hardware entropy support on > Graviton 2 instances to provide a better quality of randomness. In the > meantime these upstream changes consist a reasonable workaround to > prevent applications from hanging forever on the mg6.* instances. > > ---------------------------------------------------------------- > Andy Lutomirski (5): > random: add GRND_INSECURE to return best-effort non-cryptographic bytes > random: Don't wake crng_init_wait when crng_init == 1 > random: Add a urandom_read_nowait() for random APIs that don't warn > random: ignore GRND_RANDOM in getentropy(2) > random: make /dev/random be almost like /dev/urandom > > drivers/char/random.c | 81 +++++++++++++++++++++++++++++++++------------------------------------------------ > include/uapi/linux/random.h | 4 +++- > 2 files changed, 36 insertions(+), 49 deletions(-) > >