Message ID | 20230414151220.52040-2-cupertino.miranda@oracle.com |
---|---|
State | New |
Headers | show |
Series | *** Created tunable to force small pages on stack allocation. | expand |
* Cupertino Miranda via Libc-alpha: > Created tunable glibc.pthread.stack_hugetlb to control when hugepages > can be used for stack allocation. > In case THP are enabled and glibc.pthread.stack_hugetlb is set to > 0, glibc will madvise the kernel not to use allow hugepages for stack > allocations. Is this for the benefit of OpenJDK (among other things)? In this case, we should expose this in a pthread_attr_t interface as well, so that OpenJDK can activate this easily for the required threads. This can be done in a follow-up patch. Thanks, Florian
>> Created tunable glibc.pthread.stack_hugetlb to control when hugepages >> can be used for stack allocation. >> In case THP are enabled and glibc.pthread.stack_hugetlb is set to >> 0, glibc will madvise the kernel not to use allow hugepages for stack >> allocations. > > Is this for the benefit of OpenJDK (among other things)? Yes, it is actually. ;-) > > In this case, we should expose this in a pthread_attr_t interface as > well, so that OpenJDK can activate this easily for the required threads. > This can be done in a follow-up patch. Totally aggree. Our initial concern in proposing such change was to avoid big API discussions when we were not sure if there was other simpler solution to the problem. I will work on such patches shortly if you think it is a good idea. Thanks, Cupertino
On 14/04/23 12:12, Cupertino Miranda via Libc-alpha wrote: > Created tunable glibc.pthread.stack_hugetlb to control when hugepages > can be used for stack allocation. > In case THP are enabled and glibc.pthread.stack_hugetlb is set to > 0, glibc will madvise the kernel not to use allow hugepages for stack > allocations. > > Changed from v1: > - removed the __malloc_thp_mode calls to check if hugetlb is > enabled. > > Changed from v2: > - Added entry in manual/tunables.texi > - Fixed tunable default to description > - Code style corrections. > > Changes from v3: > - Improve tunables.texi. > > Changes from v4: > - Improved text in tunables.texi by suggestion of Adhemerval. > > Changes from v5: > - Added a new entry in NEWS. LGTM, thanks. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > --- > NEWS | 3 +++ > manual/tunables.texi | 15 +++++++++++++++ > nptl/allocatestack.c | 6 ++++++ > nptl/nptl-stack.c | 1 + > nptl/nptl-stack.h | 3 +++ > nptl/pthread_mutex_conf.c | 8 ++++++++ > sysdeps/nptl/dl-tunables.list | 6 ++++++ > 7 files changed, 42 insertions(+) > > diff --git a/NEWS b/NEWS > index 83d082afad..40964d2ee0 100644 > --- a/NEWS > +++ b/NEWS > @@ -21,6 +21,9 @@ Major new features: > > * PRIb* and PRIB* macros from C2X have been added to <inttypes.h>. > > +* A new tunable, glibc.pthread.stack_hugetlb, can be used to disable > + Transparent Huge Pages (THP) in stack allocation at pthread_create. > + > Deprecated and removed features, and other changes affecting compatibility: > > * In the Linux kernel for the hppa/parisc architecture some of the > diff --git a/manual/tunables.texi b/manual/tunables.texi > index 70dd2264c5..130f94b2bc 100644 > --- a/manual/tunables.texi > +++ b/manual/tunables.texi > @@ -459,6 +459,21 @@ registration on behalf of the application. > Restartable sequences are a Linux-specific extension. > @end deftp > > +@deftp Tunable glibc.pthread.stack_hugetlb > +This tunable controls whether to use Huge Pages in the stacks created by > +@code{pthread_create}. This tunable only affects the stacks created by > +@theglibc{}, it has no effect on stack assigned with > +@code{pthread_attr_setstack}. > + > +The default is @samp{1} where the system default value is used. Setting > +its value to @code{0} enables the use of @code{madvise} with > +@code{MADV_NOHUGEPAGE} after stack creation with @code{mmap}. > + > +This is a memory utilization optimization, since internal glibc setup of either > +the thread descriptor and the guard page might force the kernel to move the > +thread stack originally backup by Huge Pages to default pages. > +@end deftp > + > @node Hardware Capability Tunables > @section Hardware Capability Tunables > @cindex hardware capability tunables > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index c7adbccd6f..f9d8cdfd08 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -369,6 +369,12 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > if (__glibc_unlikely (mem == MAP_FAILED)) > return errno; > > + /* Do madvise in case the tunable glibc.pthread.stack_hugetlb is > + set to 0, disabling hugetlb. */ > + if (__glibc_unlikely (__nptl_stack_hugetlb == 0) > + && __madvise (mem, size, MADV_NOHUGEPAGE) != 0) > + return errno; > + > /* SIZE is guaranteed to be greater than zero. > So we can never get a null pointer back from mmap. */ > assert (mem != NULL); > diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c > index 5eb7773575..e829711cb5 100644 > --- a/nptl/nptl-stack.c > +++ b/nptl/nptl-stack.c > @@ -21,6 +21,7 @@ > #include <pthreadP.h> > > size_t __nptl_stack_cache_maxsize = 40 * 1024 * 1024; > +int32_t __nptl_stack_hugetlb = 1; > > void > __nptl_stack_list_del (list_t *elem) > diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h > index 34f8bbb15e..cf90b27c2b 100644 > --- a/nptl/nptl-stack.h > +++ b/nptl/nptl-stack.h > @@ -27,6 +27,9 @@ > /* Maximum size of the cache, in bytes. 40 MiB by default. */ > extern size_t __nptl_stack_cache_maxsize attribute_hidden; > > +/* Should allow stacks to use hugetlb. (1) is default. */ > +extern int32_t __nptl_stack_hugetlb; > + > /* Check whether the stack is still used or not. */ > static inline bool > __nptl_stack_in_use (struct pthread *pd) > diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c > index 9133384d47..6517899718 100644 > --- a/nptl/pthread_mutex_conf.c > +++ b/nptl/pthread_mutex_conf.c > @@ -44,6 +44,12 @@ TUNABLE_CALLBACK (set_stack_cache_size) (tunable_val_t *valp) > __nptl_stack_cache_maxsize = valp->numval; > } > > +static void > +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp) > +{ > + __nptl_stack_hugetlb = (int32_t) valp->numval; > +} > + > void > __pthread_tunables_init (void) > { > @@ -51,4 +57,6 @@ __pthread_tunables_init (void) > TUNABLE_CALLBACK (set_mutex_spin_count)); > TUNABLE_GET (stack_cache_size, size_t, > TUNABLE_CALLBACK (set_stack_cache_size)); > + TUNABLE_GET (stack_hugetlb, int32_t, > + TUNABLE_CALLBACK (set_stack_hugetlb)); > } > diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list > index bd1ddb121d..4cde9500b6 100644 > --- a/sysdeps/nptl/dl-tunables.list > +++ b/sysdeps/nptl/dl-tunables.list > @@ -33,5 +33,11 @@ glibc { > maxval: 1 > default: 1 > } > + stack_hugetlb { > + type: INT_32 > + minval: 0 > + maxval: 1 > + default: 1 > + } > } > }
Thanks ! Adhemerval Zanella Netto writes: > On 14/04/23 12:12, Cupertino Miranda via Libc-alpha wrote: >> Created tunable glibc.pthread.stack_hugetlb to control when hugepages >> can be used for stack allocation. >> In case THP are enabled and glibc.pthread.stack_hugetlb is set to >> 0, glibc will madvise the kernel not to use allow hugepages for stack >> allocations. >> >> Changed from v1: >> - removed the __malloc_thp_mode calls to check if hugetlb is >> enabled. >> >> Changed from v2: >> - Added entry in manual/tunables.texi >> - Fixed tunable default to description >> - Code style corrections. >> >> Changes from v3: >> - Improve tunables.texi. >> >> Changes from v4: >> - Improved text in tunables.texi by suggestion of Adhemerval. >> >> Changes from v5: >> - Added a new entry in NEWS. > > LGTM, thanks. > > Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > >> --- >> NEWS | 3 +++ >> manual/tunables.texi | 15 +++++++++++++++ >> nptl/allocatestack.c | 6 ++++++ >> nptl/nptl-stack.c | 1 + >> nptl/nptl-stack.h | 3 +++ >> nptl/pthread_mutex_conf.c | 8 ++++++++ >> sysdeps/nptl/dl-tunables.list | 6 ++++++ >> 7 files changed, 42 insertions(+) >> >> diff --git a/NEWS b/NEWS >> index 83d082afad..40964d2ee0 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -21,6 +21,9 @@ Major new features: >> >> * PRIb* and PRIB* macros from C2X have been added to <inttypes.h>. >> >> +* A new tunable, glibc.pthread.stack_hugetlb, can be used to disable >> + Transparent Huge Pages (THP) in stack allocation at pthread_create. >> + >> Deprecated and removed features, and other changes affecting compatibility: >> >> * In the Linux kernel for the hppa/parisc architecture some of the >> diff --git a/manual/tunables.texi b/manual/tunables.texi >> index 70dd2264c5..130f94b2bc 100644 >> --- a/manual/tunables.texi >> +++ b/manual/tunables.texi >> @@ -459,6 +459,21 @@ registration on behalf of the application. >> Restartable sequences are a Linux-specific extension. >> @end deftp >> >> +@deftp Tunable glibc.pthread.stack_hugetlb >> +This tunable controls whether to use Huge Pages in the stacks created by >> +@code{pthread_create}. This tunable only affects the stacks created by >> +@theglibc{}, it has no effect on stack assigned with >> +@code{pthread_attr_setstack}. >> + >> +The default is @samp{1} where the system default value is used. Setting >> +its value to @code{0} enables the use of @code{madvise} with >> +@code{MADV_NOHUGEPAGE} after stack creation with @code{mmap}. >> + >> +This is a memory utilization optimization, since internal glibc setup of either >> +the thread descriptor and the guard page might force the kernel to move the >> +thread stack originally backup by Huge Pages to default pages. >> +@end deftp >> + >> @node Hardware Capability Tunables >> @section Hardware Capability Tunables >> @cindex hardware capability tunables >> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c >> index c7adbccd6f..f9d8cdfd08 100644 >> --- a/nptl/allocatestack.c >> +++ b/nptl/allocatestack.c >> @@ -369,6 +369,12 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, >> if (__glibc_unlikely (mem == MAP_FAILED)) >> return errno; >> >> + /* Do madvise in case the tunable glibc.pthread.stack_hugetlb is >> + set to 0, disabling hugetlb. */ >> + if (__glibc_unlikely (__nptl_stack_hugetlb == 0) >> + && __madvise (mem, size, MADV_NOHUGEPAGE) != 0) >> + return errno; >> + >> /* SIZE is guaranteed to be greater than zero. >> So we can never get a null pointer back from mmap. */ >> assert (mem != NULL); >> diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c >> index 5eb7773575..e829711cb5 100644 >> --- a/nptl/nptl-stack.c >> +++ b/nptl/nptl-stack.c >> @@ -21,6 +21,7 @@ >> #include <pthreadP.h> >> >> size_t __nptl_stack_cache_maxsize = 40 * 1024 * 1024; >> +int32_t __nptl_stack_hugetlb = 1; >> >> void >> __nptl_stack_list_del (list_t *elem) >> diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h >> index 34f8bbb15e..cf90b27c2b 100644 >> --- a/nptl/nptl-stack.h >> +++ b/nptl/nptl-stack.h >> @@ -27,6 +27,9 @@ >> /* Maximum size of the cache, in bytes. 40 MiB by default. */ >> extern size_t __nptl_stack_cache_maxsize attribute_hidden; >> >> +/* Should allow stacks to use hugetlb. (1) is default. */ >> +extern int32_t __nptl_stack_hugetlb; >> + >> /* Check whether the stack is still used or not. */ >> static inline bool >> __nptl_stack_in_use (struct pthread *pd) >> diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c >> index 9133384d47..6517899718 100644 >> --- a/nptl/pthread_mutex_conf.c >> +++ b/nptl/pthread_mutex_conf.c >> @@ -44,6 +44,12 @@ TUNABLE_CALLBACK (set_stack_cache_size) (tunable_val_t *valp) >> __nptl_stack_cache_maxsize = valp->numval; >> } >> >> +static void >> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp) >> +{ >> + __nptl_stack_hugetlb = (int32_t) valp->numval; >> +} >> + >> void >> __pthread_tunables_init (void) >> { >> @@ -51,4 +57,6 @@ __pthread_tunables_init (void) >> TUNABLE_CALLBACK (set_mutex_spin_count)); >> TUNABLE_GET (stack_cache_size, size_t, >> TUNABLE_CALLBACK (set_stack_cache_size)); >> + TUNABLE_GET (stack_hugetlb, int32_t, >> + TUNABLE_CALLBACK (set_stack_hugetlb)); >> } >> diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list >> index bd1ddb121d..4cde9500b6 100644 >> --- a/sysdeps/nptl/dl-tunables.list >> +++ b/sysdeps/nptl/dl-tunables.list >> @@ -33,5 +33,11 @@ glibc { >> maxval: 1 >> default: 1 >> } >> + stack_hugetlb { >> + type: INT_32 >> + minval: 0 >> + maxval: 1 >> + default: 1 >> + } >> } >> }
On 20/04/23 17:21, Cupertino Miranda wrote: > > Thanks ! > > Adhemerval Zanella Netto writes: I also sent an updated patch to detect when the split will always happen [1]. As I wrote in the patch, the tunable might still be required in some specific scenarios, but it would be good to have some heuristic to detect such scenarios if/when distros start to active THP to be always enabled. [1] https://patchwork.sourceware.org/project/glibc/patch/20230420172436.2013698-1-adhemerval.zanella@linaro.org/
> I also sent an updated patch to detect when the split will always > happen [1]. As I wrote in the patch, the tunable might still be > required in some specific scenarios, but it would be good to have > some heuristic to detect such scenarios if/when distros start to > active THP to be always enabled. > > [1] https://patchwork.sourceware.org/project/glibc/patch/20230420172436.2013698-1-adhemerval.zanella@linaro.org/ > I have seen it and agree that it is useful. I will attempt to test it tomorrow if you would be interested in the feedback. I presume the condition is very similar to what I tested before.
diff --git a/NEWS b/NEWS index 83d082afad..40964d2ee0 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,9 @@ Major new features: * PRIb* and PRIB* macros from C2X have been added to <inttypes.h>. +* A new tunable, glibc.pthread.stack_hugetlb, can be used to disable + Transparent Huge Pages (THP) in stack allocation at pthread_create. + Deprecated and removed features, and other changes affecting compatibility: * In the Linux kernel for the hppa/parisc architecture some of the diff --git a/manual/tunables.texi b/manual/tunables.texi index 70dd2264c5..130f94b2bc 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -459,6 +459,21 @@ registration on behalf of the application. Restartable sequences are a Linux-specific extension. @end deftp +@deftp Tunable glibc.pthread.stack_hugetlb +This tunable controls whether to use Huge Pages in the stacks created by +@code{pthread_create}. This tunable only affects the stacks created by +@theglibc{}, it has no effect on stack assigned with +@code{pthread_attr_setstack}. + +The default is @samp{1} where the system default value is used. Setting +its value to @code{0} enables the use of @code{madvise} with +@code{MADV_NOHUGEPAGE} after stack creation with @code{mmap}. + +This is a memory utilization optimization, since internal glibc setup of either +the thread descriptor and the guard page might force the kernel to move the +thread stack originally backup by Huge Pages to default pages. +@end deftp + @node Hardware Capability Tunables @section Hardware Capability Tunables @cindex hardware capability tunables diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index c7adbccd6f..f9d8cdfd08 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -369,6 +369,12 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, if (__glibc_unlikely (mem == MAP_FAILED)) return errno; + /* Do madvise in case the tunable glibc.pthread.stack_hugetlb is + set to 0, disabling hugetlb. */ + if (__glibc_unlikely (__nptl_stack_hugetlb == 0) + && __madvise (mem, size, MADV_NOHUGEPAGE) != 0) + return errno; + /* SIZE is guaranteed to be greater than zero. So we can never get a null pointer back from mmap. */ assert (mem != NULL); diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c index 5eb7773575..e829711cb5 100644 --- a/nptl/nptl-stack.c +++ b/nptl/nptl-stack.c @@ -21,6 +21,7 @@ #include <pthreadP.h> size_t __nptl_stack_cache_maxsize = 40 * 1024 * 1024; +int32_t __nptl_stack_hugetlb = 1; void __nptl_stack_list_del (list_t *elem) diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h index 34f8bbb15e..cf90b27c2b 100644 --- a/nptl/nptl-stack.h +++ b/nptl/nptl-stack.h @@ -27,6 +27,9 @@ /* Maximum size of the cache, in bytes. 40 MiB by default. */ extern size_t __nptl_stack_cache_maxsize attribute_hidden; +/* Should allow stacks to use hugetlb. (1) is default. */ +extern int32_t __nptl_stack_hugetlb; + /* Check whether the stack is still used or not. */ static inline bool __nptl_stack_in_use (struct pthread *pd) diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c index 9133384d47..6517899718 100644 --- a/nptl/pthread_mutex_conf.c +++ b/nptl/pthread_mutex_conf.c @@ -44,6 +44,12 @@ TUNABLE_CALLBACK (set_stack_cache_size) (tunable_val_t *valp) __nptl_stack_cache_maxsize = valp->numval; } +static void +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp) +{ + __nptl_stack_hugetlb = (int32_t) valp->numval; +} + void __pthread_tunables_init (void) { @@ -51,4 +57,6 @@ __pthread_tunables_init (void) TUNABLE_CALLBACK (set_mutex_spin_count)); TUNABLE_GET (stack_cache_size, size_t, TUNABLE_CALLBACK (set_stack_cache_size)); + TUNABLE_GET (stack_hugetlb, int32_t, + TUNABLE_CALLBACK (set_stack_hugetlb)); } diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list index bd1ddb121d..4cde9500b6 100644 --- a/sysdeps/nptl/dl-tunables.list +++ b/sysdeps/nptl/dl-tunables.list @@ -33,5 +33,11 @@ glibc { maxval: 1 default: 1 } + stack_hugetlb { + type: INT_32 + minval: 0 + maxval: 1 + default: 1 + } } }