From patchwork Mon Sep 2 17:38:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 1979708 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=u/FlGJno; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WyGGF6X7hz1yXY for ; Tue, 3 Sep 2024 03:39:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9D169385DDE5 for ; Mon, 2 Sep 2024 17:39:15 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by sourceware.org (Postfix) with ESMTPS id 730B83858C33 for ; Mon, 2 Sep 2024 17:38:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 730B83858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 730B83858C33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725298740; cv=none; b=cgQgkB9NUV/XwVQxn4JAbVW189gduYmxvbjFzNauTfRHP4eTiKhLXzJox0Ognwnr/gIZZ2fi6Id5g9Ryzs2FUBF7X95NPuR+t24ufunetXUvKVBcvS+vuetOEICM4IyepUD+xIURG2tG2QMvVaebmWVT95oo4RwsjoHOdZPL5yM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725298740; c=relaxed/simple; bh=NLwPCvutcRk37KuUxWU2lNDy8a3XOluB5iBoJoY9/BA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=SaDMn+oLaRE/vLEKv/ZohBO2PdpumDhU8vxwNmIoQTNSpOCen6fMGR/O/a94yJ7RyYrAEmZde3w39CnupHAH7HJ33rSE5pmkJIoBlYt+7odKf8qNNaCyNafx3Q+GxqoplmSkK6GxCZsktDGW/dbcj8gX0VE3QXpPNC+mA0/9xXk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-7cd8803fe0aso2879162a12.0 for ; Mon, 02 Sep 2024 10:38:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1725298735; x=1725903535; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=yi8uMNLnws+MVyGB3CM7R6jRL9kxjorLsREBLmmP/m4=; b=u/FlGJno8pYJFUuwwVIDLGJCYVBnrAUB0NpPxPfzejD5OzhxUu+9STTAR/AWf4gwRP GhPuHeDHpqpJ9uUdV/hbf2r8x0H+GvpmbMN6Rmlai7puZlYAJKhCWEZyWvO96UfUqvax MPFwXTDmlskIDTHXmi8nIuPnhvz6C+/vv2kYpMKN+EQ5tMxDwZZqrDLaQXFj1wQTtMhm GIUs5zx+AZrV8CX6TooXtgc5lIOEw5MsvhX3NBDIntUsHui5F715MmjdrU4/uH4YFR6P RAPuanaikU5clCCIfvFZvdkq9pY66L/hiEcwI/Sb2Px2K+C/GkW7PDKInKanUk6C9mbr 2M+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725298735; x=1725903535; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yi8uMNLnws+MVyGB3CM7R6jRL9kxjorLsREBLmmP/m4=; b=H8Pi++ojBluUa5Ed/VT13hkrMBN+q/bovM5QahZqzt3dm96Y2nqu/tUVH8BAgwrHyx keL4XQbkMX0x2dqQCbQaqX0kHIuMV/+7jiohwXqETxv+IfExEKW/RVX/ui26h/2o256n qQQHQQj5TXKdtPu+2l35A9cQCdq7canw+qw+WiBUAOp7BjNTLyY48aWhyy3eNslA1UJd Vuf4wQLt1U1sHaHr38/yFYvAAi4FJA9OfZbYLegG5dc0N1gSHIkajlS8JOpjWvMBL1WO 9bvBbfHyLUpBYu1ce4PY39xa12ZM5NiwlT9YA9iwxWszigTVMXzJ+suq5kgP5djuJo7B lLPg== X-Gm-Message-State: AOJu0Yy/kkoFnQY7+X17LeRhkP2jUb2ED4f5VYBt/Sr6NFIRdxyHyvmR pfR6pkRkHLA6lfDrMPWR95QV/pwfxxxDLEPv23vdEHlHeyDGUU1TmBKvq2tGDatNRA7CAje8iFI 5Vno= X-Google-Smtp-Source: AGHT+IFKnuYj4E75q2ZOplP7XfBp0Tyx7VmyArYhWzsewTdUPUJg93ciCCU7nEHS+f90WuAkrEhCSw== X-Received: by 2002:a17:903:228a:b0:205:86f5:3351 with SMTP id d9443c01a7336-20586f537edmr29658485ad.41.1725298734426; Mon, 02 Sep 2024 10:38:54 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:e912:21a:cf7f:3adf:1e7f]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2053e12adf9sm44541715ad.85.2024.09.02.10.38.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Sep 2024 10:38:54 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Florian Weimer , "Jason A . Donenfeld" Subject: [PATCH v4] linux: Add support for getrandom vDSO Date: Mon, 2 Sep 2024 14:38:27 -0300 Message-ID: <20240902173846.637149-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Linux 6.11 will support calling getrandom() from the vDSO (commit 4ad10a5f5f78a5b3e525a63bd075a4eb1139dde1). It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always see a consistent state even if interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque states will be either in the free states list (grnd_alloc.states) or allocated to a running thread. It's currently enabled for x86_64. As the kernel support gains more platforms (arm64 is in the works), this can be easily turned on for those. Checked on x86_64-linux-gnu. Co-authored-by: Jason A. Donenfeld NB: This potentially makes getrandom a non-cancellable entrypoint if the vDSO syscall is called for any reason. This is something we're going to address later, after this lands. --- The loongarch support is already on random tree, while aarch64 and powerpc support is under review. Changes from v3: * Query the vgetrandom mmap parameters at loading time, instead of each block allocation. Changes from v2: * Move the getrandom opaque state buffer to 'struct pthread'. * Move the state release to start_thread and after signals are blocked, to avoid a possible concurrency update. * Move the state reset on fork() to reclaim_stack. This makes all the logic of handling threads on one place, and simplify the __getrandom_fork_subprocess (no need to extra argument). * Fixed some style issue on comments. * Do not use mremap to reallocate the free states, it avoids a possible fork() issue where the allocator state pointer update is interrupted just after the mremap call returns, but before the assignment. This is done by mmap/munmap a a new buffer with the new size, a fork() will not invalidate the old buffer. * Block all signals before taking the lock to avoid the _Fork() issue if it interrupts getrandom while the lock is taken. --- include/sys/random.h | 4 + malloc/malloc.c | 4 +- nptl/allocatestack.c | 2 + nptl/descr.h | 3 + nptl/pthread_create.c | 5 + sysdeps/generic/not-cancel.h | 4 +- sysdeps/mach/hurd/not-cancel.h | 4 +- sysdeps/nptl/_Fork.c | 2 + sysdeps/nptl/fork.h | 12 + sysdeps/unix/sysv/linux/dl-vdso-setup.c | 12 + sysdeps/unix/sysv/linux/dl-vdso-setup.h | 17 ++ sysdeps/unix/sysv/linux/getrandom.c | 220 ++++++++++++++++++- sysdeps/unix/sysv/linux/getrandom_vdso.h | 36 +++ sysdeps/unix/sysv/linux/include/sys/random.h | 11 + sysdeps/unix/sysv/linux/not-cancel.h | 7 +- sysdeps/unix/sysv/linux/x86_64/sysdep.h | 1 + 16 files changed, 337 insertions(+), 7 deletions(-) create mode 100644 sysdeps/unix/sysv/linux/getrandom_vdso.h create mode 100644 sysdeps/unix/sysv/linux/include/sys/random.h diff --git a/include/sys/random.h b/include/sys/random.h index 6aa313d35d..35f64a0339 100644 --- a/include/sys/random.h +++ b/include/sys/random.h @@ -1,8 +1,12 @@ #ifndef _SYS_RANDOM_H #include +#include_next + # ifndef _ISOMAC +# include + extern ssize_t __getrandom (void *__buffer, size_t __length, unsigned int __flags) __wur; libc_hidden_proto (__getrandom) diff --git a/malloc/malloc.c b/malloc/malloc.c index bcb6e5b83c..9e577ab900 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3140,8 +3140,8 @@ static void tcache_key_initialize (void) { /* We need to use the _nostatus version here, see BZ 29624. */ - if (__getrandom_nocancel_nostatus (&tcache_key, sizeof(tcache_key), - GRND_NONBLOCK) + if (__getrandom_nocancel_nostatus_direct (&tcache_key, sizeof(tcache_key), + GRND_NONBLOCK) != sizeof (tcache_key)) { tcache_key = random_bits (); diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 2cb562f8ea..d9adb5856c 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -132,6 +132,8 @@ get_cached_stack (size_t *sizep, void **memp) __libc_lock_init (result->exit_lock); memset (&result->tls_state, 0, sizeof result->tls_state); + result->getrandom_buf = NULL; + /* Clear the DTV. */ dtv_t *dtv = GET_DTV (TLS_TPADJ (result)); for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt) diff --git a/nptl/descr.h b/nptl/descr.h index 65d3baaee3..989995262b 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -404,6 +404,9 @@ struct pthread /* Used on strsignal. */ struct tls_internal_t tls_state; + /* getrandom vDSO per-thread opaque state. */ + void *getrandom_buf; + /* rseq area registered with the kernel. Use a custom definition here to isolate from kernel struct rseq changes. The implementation of sched_getcpu needs acccess to the cpu_id field; diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 1d3665d5ed..d1f5568b3b 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -38,6 +38,7 @@ #include #include #include +#include #include @@ -549,6 +550,10 @@ start_thread (void *arg) } #endif + /* Release the vDSO getrandom per-thread buffer with all signal blocked, + to avoid creating a new free-state block during thread release. */ + __getrandom_vdso_release (pd); + if (!pd->user_stack) advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index 2dd1064600..8e3f49cc07 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -51,7 +51,9 @@ __fcntl64 (fd, cmd, __VA_ARGS__) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) -#define __getrandom_nocancel_nostatus(buf, size, flags) \ +#define __getrandom_nocancel_direct(buf, size, flags) \ + __getrandom (buf, size, flags) +#define __getrandom_nocancel_nostatus_direct(buf, size, flags) \ __getrandom (buf, size, flags) #define __poll_infinity_nocancel(fds, nfds) \ __poll (fds, nfds, -1) diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 69fb3c00ef..ec5f5aa895 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -79,7 +79,7 @@ __typeof (__fcntl) __fcntl_nocancel; /* Non cancellable getrandom syscall that does not also set errno in case of failure. */ static inline ssize_t -__getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_nostatus_direct (void *buf, size_t buflen, unsigned int flags) { int save_errno = errno; ssize_t r = __getrandom (buf, buflen, flags); @@ -90,6 +90,8 @@ __getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __getrandom_nocancel_direct(buf, size, flags) \ + __getrandom (buf, size, flags) #define __poll_infinity_nocancel(fds, nfds) \ __poll (fds, nfds, -1) diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index ef199ddbc3..adb7c18b29 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -18,6 +18,7 @@ #include #include +#include pid_t _Fork (void) @@ -43,6 +44,7 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); + call_function_static_weak (__getrandom_fork_subprocess); } return pid; } diff --git a/sysdeps/nptl/fork.h b/sysdeps/nptl/fork.h index 7643926df9..106b2cf71d 100644 --- a/sysdeps/nptl/fork.h +++ b/sysdeps/nptl/fork.h @@ -26,6 +26,7 @@ #include #include #include +#include static inline void fork_system_setup (void) @@ -46,6 +47,7 @@ fork_system_setup_after_fork (void) call_function_static_weak (__mq_notify_fork_subprocess); call_function_static_weak (__timer_fork_subprocess); + call_function_static_weak (__getrandom_fork_subprocess); } /* In case of a fork() call the memory allocation in the child will be @@ -128,9 +130,19 @@ reclaim_stacks (void) curp->specific_used = true; } } + + call_function_static_weak (__getrandom_reset_state, curp); } } + /* Also reset stale getrandom states for user stack threads. */ + list_for_each (runp, &GL (dl_stack_user)) + { + struct pthread *curp = list_entry (runp, struct pthread, list); + if (curp != self) + call_function_static_weak (__getrandom_reset_state, curp); + } + /* Add the stack of all running threads to the cache. */ list_splice (&GL (dl_stack_used), &GL (dl_stack_cache)); diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c index 3a44944dbb..b117a25922 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c @@ -66,6 +66,18 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres) (clockid_t, PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t, struct __timespec64 *) RELRO; # endif +# ifdef HAVE_GETRANDOM_VSYSCALL +PROCINFO_CLASS ssize_t (*_dl_vdso_getrandom) (void *buffer, size_t len, + unsigned int flags, void *state, + size_t state_len) RELRO; +/* These values will be initialized at loading time by calling the + _dl_vdso_getrandom with a special value. The 'state_size' is the opaque + state size per-thread allocated with a mmap using 'mmap_prot' and + 'mmap_flags' argument. */ +PROCINFO_CLASS uint32_t _dl_vdso_getrandom_state_size RELRO; +PROCINFO_CLASS uint32_t _dl_vdso_getrandom_mmap_prot RELRO; +PROCINFO_CLASS uint32_t _dl_vdso_getrandom_mmap_flags RELRO; +# endif /* PowerPC specific ones. */ # ifdef HAVE_GET_TBFREQ diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h index 8aee5a8212..c63b7689e5 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h @@ -19,6 +19,10 @@ #ifndef _DL_VDSO_INIT_H #define _DL_VDSO_INIT_H +#ifdef HAVE_GETRANDOM_VSYSCALL +# include +#endif + /* Initialize the VDSO functions pointers. */ static inline void __attribute__ ((always_inline)) setup_vdso_pointers (void) @@ -50,6 +54,19 @@ setup_vdso_pointers (void) #ifdef HAVE_RISCV_HWPROBE GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE); #endif +#ifdef HAVE_GETRANDOM_VSYSCALL + GLRO(dl_vdso_getrandom) = dl_vdso_vsym (HAVE_GETRANDOM_VSYSCALL); + if (GLRO(dl_vdso_getrandom) != NULL) + { + struct vgetrandom_opaque_params params; + if (GLRO(dl_vdso_getrandom) (NULL, 0, 0, ¶ms, ~0UL) == 0) + { + GLRO(dl_vdso_getrandom_state_size) = params.size_of_opaque_state; + GLRO(dl_vdso_getrandom_mmap_prot) = params.mmap_prot; + GLRO(dl_vdso_getrandom_mmap_flags) = params.mmap_flags; + } + } +#endif } #endif diff --git a/sysdeps/unix/sysv/linux/getrandom.c b/sysdeps/unix/sysv/linux/getrandom.c index 777d1decf0..7cdae300ff 100644 --- a/sysdeps/unix/sysv/linux/getrandom.c +++ b/sysdeps/unix/sysv/linux/getrandom.c @@ -21,12 +21,230 @@ #include #include +#ifdef HAVE_GETRANDOM_VSYSCALL +# include +# include +# include +# include +# include +# include +# include +# include + +# define ALIGN_PAGE(p) PTR_ALIGN_UP (p, GLRO (dl_pagesize)) +# define READ_ONCE(p) (*((volatile typeof (p) *) (&(p)))) +# define WRITE_ONCE(p, v) (*((volatile typeof (p) *) (&(p))) = (v)) +# define RESERVE_PTR(p) ((void *) ((uintptr_t) (p) | 1UL)) +# define RELEASE_PTR(p) ((void *) ((uintptr_t) (p) & ~1UL)) +# define IS_RESERVED_PTR(p) (!!((uintptr_t) (p) & 1UL)) + +static struct +{ + __libc_lock_define (, lock); + + void **states; /* Queue of opaque states allocated with the kernel + provided flags and used on getrandom vDSO call. */ + size_t len; /* Number of available free states in the queue. */ + size_t total; /* Number of states allocated from the kernel. */ + size_t cap; /* Total numver of states that 'states' can hold before + needed to be resized. */ +} grnd_alloc = { + .lock = LLL_LOCK_INITIALIZER +}; + +static bool +vgetrandom_get_state_alloc (void) +{ + size_t num = __get_nprocs (); /* Just a decent heuristic. */ + + size_t block_size = ALIGN_PAGE (num * GLRO(dl_vdso_getrandom_state_size)); + num = (GLRO (dl_pagesize) / GLRO(dl_vdso_getrandom_state_size)) * + (block_size / GLRO (dl_pagesize)); + void *block = __mmap (NULL, block_size, GLRO(dl_vdso_getrandom_mmap_prot), + GLRO(dl_vdso_getrandom_mmap_flags), -1, 0); + if (block == MAP_FAILED) + return false; + __set_vma_name (block, block_size, " glibc: getrandom"); + + if (grnd_alloc.total + num > grnd_alloc.cap) + { + /* Use a new mmap instead of trying to mremap. It avoids a + potential multithread fork issue where fork is called just after + mremap returns but before assigning to the grnd_alloc.states, + thus making the its value invalid in the child. */ + void *old_states = grnd_alloc.states; + size_t old_states_size = ALIGN_PAGE (sizeof (*grnd_alloc.states) * + grnd_alloc.total + num); + size_t states_size; + if (grnd_alloc.states == NULL) + states_size = old_states_size; + else + states_size = ALIGN_PAGE (sizeof (*grnd_alloc.states) + * grnd_alloc.cap); + + void **states = __mmap (NULL, states_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (states == MAP_FAILED) + { + __munmap (block, block_size); + return false; + } + + /* Atomically replace the old state, so if a fork happens the child + process will see a consistent free state buffer. The size might + not be updated, but it does not really matter since the buffer is + always increased. */ + atomic_store_relaxed (&grnd_alloc.states, states); + if (old_states != NULL) + __munmap (old_states, old_states_size); + + __set_vma_name (states, states_size, " glibc: getrandom states"); + grnd_alloc.cap = states_size / sizeof (*grnd_alloc.states); + } + + for (size_t i = 0; i < num; ++i) + { + /* States should not straddle a page. */ + if (((uintptr_t) block & (GLRO (dl_pagesize) - 1)) + + GLRO(dl_vdso_getrandom_state_size) > GLRO (dl_pagesize)) + block = ALIGN_PAGE (block); + grnd_alloc.states[i] = block; + block += GLRO(dl_vdso_getrandom_state_size); + } + grnd_alloc.len = num; + grnd_alloc.total += num; + + return true; +} + +/* Allocate an opaque state for vgetrandom. If the grnd_alloc does not have + any, mmap() another page of them using the vgetrandom parameters. */ +static void * +vgetrandom_get_state (void) +{ + void *state = NULL; + + /* The signal blocking avoid the potential issue where _Fork() (which is + async-signal-safe) is called with the lock taken. The function is + called only once during thread lifetime, so the overhead should be + minimal. */ + internal_sigset_t set; + internal_signal_block_all (&set); + __libc_lock_lock (grnd_alloc.lock); + + if (grnd_alloc.len > 0 || vgetrandom_get_state_alloc ()) + state = grnd_alloc.states[--grnd_alloc.len]; + + __libc_lock_unlock (grnd_alloc.lock); + internal_signal_restore_set (&set); + + return state; +} + +/* Returns true when vgetrandom is used successfully. Returns false if the + syscall fallback should be issued in the case the vDSO is not present, in + the case of reentrancy, or if any memory allocation fails. */ +static bool +getrandom_vdso (ssize_t *ret, void *buffer, size_t length, unsigned int flags) +{ + if (GLRO (dl_vdso_getrandom_state_size) == 0) + return false; + + struct pthread *self = THREAD_SELF; + + /* If the LSB of getrandom_buf is set, then this function is already being + called, and we have a reentrant call from a signal handler. In this case + fallback to the syscall. */ + void *state = READ_ONCE (self->getrandom_buf); + if (IS_RESERVED_PTR (state)) + return false; + WRITE_ONCE (self->getrandom_buf, RESERVE_PTR (state)); + + bool r = false; + if (state == NULL) + { + state = vgetrandom_get_state (); + if (state == NULL) + goto out; + } + + *ret = GLRO (dl_vdso_getrandom) (buffer, length, flags, state, + GLRO(dl_vdso_getrandom_state_size)); + if (INTERNAL_SYSCALL_ERROR_P (*ret)) + { + __set_errno (INTERNAL_SYSCALL_ERRNO (*ret)); + *ret = -1; + } + r = true; + +out: + WRITE_ONCE (self->getrandom_buf, state); + return r; +} +#endif + +/* Re-add the state state from CURP on the free list. */ +void +__getrandom_reset_state (struct pthread *curp) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + if (grnd_alloc.states == NULL || curp->getrandom_buf == NULL) + return; + grnd_alloc.states[grnd_alloc.len++] = RELEASE_PTR (curp->getrandom_buf); + curp->getrandom_buf = NULL; +#endif +} + +/* Called when a thread terminates, and adds its random buffer back into the + allocator pool for use in a future thread. */ +void +__getrandom_vdso_release (struct pthread *curp) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + if (curp->getrandom_buf == NULL) + return; + + __libc_lock_lock (grnd_alloc.lock); + grnd_alloc.states[grnd_alloc.len++] = curp->getrandom_buf; + __libc_lock_unlock (grnd_alloc.lock); +#endif +} + +/* Reset the internal lock state in case another thread has locked while + this thread calls fork. The stale thread states will be handled by + reclaim_stacks which calls __getrandom_reset_state on each thread. */ +void +__getrandom_fork_subprocess (void) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + grnd_alloc.lock = LLL_LOCK_INITIALIZER; +#endif +} + +ssize_t +__getrandom_nocancel (void *buffer, size_t length, unsigned int flags) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + ssize_t r; + if (getrandom_vdso (&r, buffer, length, flags)) + return r; +#endif + + return INLINE_SYSCALL_CALL (getrandom, buffer, length, flags); +} + /* Write up to LENGTH bytes of randomness starting at BUFFER. Return the number of bytes written, or -1 on error. */ ssize_t __getrandom (void *buffer, size_t length, unsigned int flags) { - return SYSCALL_CANCEL (getrandom, buffer, length, flags); +#ifdef HAVE_GETRANDOM_VSYSCALL + ssize_t r; + if (getrandom_vdso (&r, buffer, length, flags)) + return r; +#endif + + return INTERNAL_SYSCALL_CALL (getrandom, buffer, length, flags); } libc_hidden_def (__getrandom) weak_alias (__getrandom, getrandom) diff --git a/sysdeps/unix/sysv/linux/getrandom_vdso.h b/sysdeps/unix/sysv/linux/getrandom_vdso.h new file mode 100644 index 0000000000..d1ef690e50 --- /dev/null +++ b/sysdeps/unix/sysv/linux/getrandom_vdso.h @@ -0,0 +1,36 @@ +/* Linux getrandom vDSO support. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _GETRANDOM_VDSO_H +#define _GETRANDOM_VDSO_H + +#include +#include +#include + +/* Used to query the vDSO for the required mmap flags and the opaque + per-thread state size Defined by linux/random.h. */ +struct vgetrandom_opaque_params +{ + uint32_t size_of_opaque_state; + uint32_t mmap_prot; + uint32_t mmap_flags; + uint32_t reserved[13]; +}; + +#endif diff --git a/sysdeps/unix/sysv/linux/include/sys/random.h b/sysdeps/unix/sysv/linux/include/sys/random.h new file mode 100644 index 0000000000..3b96f1f287 --- /dev/null +++ b/sysdeps/unix/sysv/linux/include/sys/random.h @@ -0,0 +1,11 @@ +#ifndef _LINUX_SYS_RANDOM +#define _LINUX_SYS_RANDOm + +# ifndef _ISOMAC +# include + +extern void __getrandom_fork_subprocess (void) attribute_hidden; +extern void __getrandom_vdso_release (struct pthread *curp) attribute_hidden; +extern void __getrandom_reset_state (struct pthread *curp) attribute_hidden; +# endif +#endif diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2a7585b73f..12f26912d3 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -27,6 +27,7 @@ #include #include #include +#include /* Non cancellable open syscall. */ __typeof (open) __open_nocancel; @@ -84,15 +85,17 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) } static inline ssize_t -__getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_direct (void *buf, size_t buflen, unsigned int flags) { return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags); } +__typeof (getrandom) __getrandom_nocancel attribute_hidden; + /* Non cancellable getrandom syscall that does not also set errno in case of failure. */ static inline ssize_t -__getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_nostatus_direct (void *buf, size_t buflen, unsigned int flags) { return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); } diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h index a2b021bd86..7dc072ae2d 100644 --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h @@ -376,6 +376,7 @@ # define HAVE_TIME_VSYSCALL "__vdso_time" # define HAVE_GETCPU_VSYSCALL "__vdso_getcpu" # define HAVE_CLOCK_GETRES64_VSYSCALL "__vdso_clock_getres" +# define HAVE_GETRANDOM_VSYSCALL "__vdso_getrandom" # define HAVE_CLONE3_WRAPPER 1