diff mbox series

[libstdc++] Refactor/cleanup of atomic wait implementation

Message ID 20210323190052.261853-1-rodgert@appliantology.com
State New
Headers show
Series [libstdc++] Refactor/cleanup of atomic wait implementation | expand

Commit Message

Thomas Rodgers March 23, 2021, 7 p.m. UTC
From: Thomas Rodgers <rodgert@twrodgers.com>

* This patch addresses jwakely's previous feedback.
* This patch also subsumes thiago.macieira@intel.com 's 'Uncontroversial
  improvements to C++20 wait-related implementation'.
* This patch also changes the atomic semaphore implementation to avoid
  checking for any waiters before a FUTEX_WAKE op.

This is a substantial rewrite of the atomic wait/notify (and timed wait
counterparts) implementation.

The previous __platform_wait looped on EINTR however this behavior is
not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
now controls whether wait/notify are implemented using a platform
specific primitive or with a platform agnostic mutex/condvar. This
patch only supplies a definition for linux futexes. A future update
could add support __ulock_wait/wake on Darwin, for instance.

The members of __waiters were lifted to a new base class. The members
are now arranged such that overall sizeof(__waiters_base) fits in two
cache lines (on platforms with at least 64 byte cache lines). The
definition will also use destructive_interference_size for this if it
is available.

The __waiters type is now specific to untimed waits. Timed waits have a
corresponding __timed_waiters type. Much of the code has been moved from
the previous __atomic_wait() free function to the __waiter_base template
and a __waiter derived type is provided to implement the un-timed wait
operations. A similar change has been made to the timed wait
implementation.

The __atomic_spin code has been extended to take a spin policy which is
invoked after the initial busy wait loop. The default policy is to
return from the spin. The timed wait code adds a timed backoff spinning
policy. The code from <thread> which implements this_thread::sleep_for,
sleep_until has been moved to a new <bits/std_thread_sleep.h> header
which allows the thread sleep code to be consumed without pulling in the
whole of <thread>.

The entry points into the wait/notify code have been restructured to
support either -
   * Testing the current value of the atomic stored at the given address
     and waiting on a notification.
   * Applying a predicate to determine if the wait was satisfied.
The entry points were renamed to make it clear that the wait and wake
operations operate on addresses. The first variant takes the expected
value and a function which returns the current value that should be used
in comparison operations, these operations are named with a _v suffix
(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
variant. Barriers, latches and semaphores use the predicate variant.

This change also centralizes what it means to compare values for the
purposes of atomic<T>::wait rather than scattering through individual
predicates.

This change also centralizes the repetitive code which adjusts for
different user supplied clocks (this should be moved elsewhere
and all such adjustments should use a common implementation).

This change also removes the hashing of the pointer and uses
the pointer value directly for indexing into the waiters table.

libstdc++-v3/ChangeLog:
	* include/Makefile.am: Add new <bits/std_thread_sleep.h> header.
	* include/Makefile.in: Regenerate.
	* include/bits/atomic_base.h: Adjust all calls
	to __atomic_wait/__atomic_notify for new call signatures.
	* include/bits/atomic_wait.h: Extensive rewrite.
	* include/bits/atomic_timed_wait.h: Likewise.
	* include/bits/semaphore_base.h: Adjust all calls
	to __atomic_wait/__atomic_notify for new call signatures.
	* include/bits/std_thread_sleep.h: New file.
	* include/std/atomic: Likewise.
	* include/std/barrier: Likewise.
	* include/std/latch: Likewise.
	* testsuite/29_atomics/atomic/wait_notify/bool.cc: Simplify
	test.
	* testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise.
	* testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise.
	* testsuite/29_atomics/atomic_flag/wait_notify.cc: Likewise.
	* testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise.
	* testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise.
	* testsuite/29_atomics/atomic_ref/wait_notify.cc: Likewise.
---
 libstdc++-v3/include/Makefile.am              |   1 +
 libstdc++-v3/include/Makefile.in              |   1 +
 libstdc++-v3/include/bits/atomic_base.h       |  36 +-
 libstdc++-v3/include/bits/atomic_timed_wait.h | 444 ++++++++++------
 libstdc++-v3/include/bits/atomic_wait.h       | 475 ++++++++++++------
 libstdc++-v3/include/bits/semaphore_base.h    | 192 +++----
 libstdc++-v3/include/bits/std_thread_sleep.h  | 119 +++++
 libstdc++-v3/include/std/atomic               |  15 +-
 libstdc++-v3/include/std/barrier              |  13 +-
 libstdc++-v3/include/std/latch                |   8 +-
 libstdc++-v3/include/std/semaphore            |   9 +-
 libstdc++-v3/include/std/thread               |  68 +--
 .../29_atomics/atomic/wait_notify/bool.cc     |  37 +-
 .../29_atomics/atomic/wait_notify/generic.cc  |  19 +-
 .../29_atomics/atomic/wait_notify/pointers.cc |  36 +-
 .../29_atomics/atomic_flag/wait_notify/1.cc   |  37 +-
 .../29_atomics/atomic_float/wait_notify.cc    |  26 +-
 .../29_atomics/atomic_integral/wait_notify.cc |  73 +--
 .../29_atomics/atomic_ref/wait_notify.cc      |  76 +--
 19 files changed, 970 insertions(+), 715 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/std_thread_sleep.h

Comments

Jonathan Wakely April 15, 2021, 12:46 p.m. UTC | #1
On 23/03/21 12:00 -0700, Thomas Rodgers wrote:
>From: Thomas Rodgers <rodgert@twrodgers.com>
>
>* This patch addresses jwakely's previous feedback.
>* This patch also subsumes thiago.macieira@intel.com 's 'Uncontroversial

If this part is intended as part of the commit msg let's put Thiago's
name rather than email address, but I'm assuming this preamble isn't
intended for the commit anyway.

>  improvements to C++20 wait-related implementation'.
>* This patch also changes the atomic semaphore implementation to avoid
>  checking for any waiters before a FUTEX_WAKE op.
>
>This is a substantial rewrite of the atomic wait/notify (and timed wait
>counterparts) implementation.
>
>The previous __platform_wait looped on EINTR however this behavior is
>not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
>now controls whether wait/notify are implemented using a platform
>specific primitive or with a platform agnostic mutex/condvar. This
>patch only supplies a definition for linux futexes. A future update
>could add support __ulock_wait/wake on Darwin, for instance.
>
>The members of __waiters were lifted to a new base class. The members
>are now arranged such that overall sizeof(__waiters_base) fits in two
>cache lines (on platforms with at least 64 byte cache lines). The
>definition will also use destructive_interference_size for this if it
>is available.

N.B. that makes the ABI potentially different with different
compilers, e.g. if you compile it today it will use 64, but then you
compile it with some future version of Clang that defines the
interference sizes it might use a different value. That's OK for now,
but is something to be aware of and remember.


>The __waiters type is now specific to untimed waits. Timed waits have a
>corresponding __timed_waiters type. Much of the code has been moved from
>the previous __atomic_wait() free function to the __waiter_base template
>and a __waiter derived type is provided to implement the un-timed wait
>operations. A similar change has been made to the timed wait
>implementation.

While reading this code I keep getting confused between __waiter
singular and __waiters plural. Would something like __waiter_pool or
__waiters_mgr work instead of __waiters?

>The __atomic_spin code has been extended to take a spin policy which is
>invoked after the initial busy wait loop. The default policy is to
>return from the spin. The timed wait code adds a timed backoff spinning
>policy. The code from <thread> which implements this_thread::sleep_for,
>sleep_until has been moved to a new <bits/std_thread_sleep.h> header
>which allows the thread sleep code to be consumed without pulling in the
>whole of <thread>.

The new header is misnamed. The existing <bits/std_foo.h> headers all
define std::foo, but this doesn't define std::thread::sleep* or
std::thread_sleep*. I think <bits/thread_sleep.h> would be fine, or
<bits/this_thread_sleep.h> if you prefer that.

The original reason I introduced <bits/std_mutex.h> was that
<bits/mutex.h> seemed too likely to clash with something in glibc or
another project using "bits" as a prefix, so I figured std_mutex.h for
std::mutex would be safer. I had the same concern for <bits/thread.h>
and so that's <bits/std_thread.h> too, but I think thread_sleep is
probably sufficiently un-clashy, and this_thread_sleep definitely so.



>The entry points into the wait/notify code have been restructured to
>support either -
>   * Testing the current value of the atomic stored at the given address
>     and waiting on a notification.
>   * Applying a predicate to determine if the wait was satisfied.
>The entry points were renamed to make it clear that the wait and wake
>operations operate on addresses. The first variant takes the expected
>value and a function which returns the current value that should be used
>in comparison operations, these operations are named with a _v suffix
>(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
>variant. Barriers, latches and semaphores use the predicate variant.
>
>This change also centralizes what it means to compare values for the
>purposes of atomic<T>::wait rather than scattering through individual
>predicates.

I like this a lot more, thanks.


>diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
>index 2dc00676054..2e46691c59a 100644
>--- a/libstdc++-v3/include/bits/atomic_base.h
>+++ b/libstdc++-v3/include/bits/atomic_base.h
>@@ -1017,8 +1015,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       wait(const _Tp* __ptr, _Val<_Tp> __old,
> 	   memory_order __m = memory_order_seq_cst) noexcept
>       {
>-	std::__atomic_wait(__ptr, __old,
>-	    [=]() { return load(__ptr, __m) == __old; });
>+	std::__atomic_wait_address_v(__ptr, __old,
>+	    [__ptr, __m]() { return load(__ptr, __m); });

Pre-existing, but __ptr is dependent here so this needs to call
__atomic_impl::load to prevent ADL.



>diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
>index a0c5ef4374e..4b876236d2b 100644
>--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
>+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
>@@ -36,6 +36,7 @@
>
> #if __cpp_lib_atomic_wait
> #include <bits/functional_hash.h>
>+#include <bits/std_thread_sleep.h>
>
> #include <chrono>
>
>@@ -48,19 +49,34 @@ namespace std _GLIBCXX_VISIBILITY(default)
> {
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>-  enum class __atomic_wait_status { no_timeout, timeout };
>-
>   namespace __detail
>   {
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-    using __platform_wait_clock_t = chrono::steady_clock;
>+    using __wait_clock_t = chrono::steady_clock;
>
>-    template<typename _Duration>
>-      __atomic_wait_status
>-      __platform_wait_until_impl(__platform_wait_t* __addr,
>-				 __platform_wait_t __val,
>-				 const chrono::time_point<
>-					  __platform_wait_clock_t, _Duration>&
>+    template<typename _Clock, typename _Dur>
>+      __wait_clock_t::time_point
>+      __to_wait_clock(const chrono::time_point<_Clock, _Dur>& __atime) noexcept
>+      {
>+	const typename _Clock::time_point __c_entry = _Clock::now();
>+	const __wait_clock_t::time_point __s_entry = __wait_clock_t::now();

This is copy&pasted from elsewhere where the "s" prefix is for
system_clock (or steady_clock) so maybe here we want__w_entry
for wait clock?

>+	const auto __delta = __atime - __c_entry;
>+	return __s_entry + __delta;

I think this should be:

   using __w_dur = typename __wait_clock_t::duration;
   return __s_entry + chrono::ceil<__w_dur>(__delta);


>+      }
>+
>+    template<typename _Dur>
>+      __wait_clock_t::time_point
>+      __to_wait_clock(const chrono::time_point<__wait_clock_t,
>+					       _Dur>& __atime) noexcept
>+      { return __atime; }

And strictly speaking, this should be:

   return chrono::ceil<typename __wait_clock_t::duration>(__atime);

but it only matters if somebody passes in a time_point with a
sub-nanosecond (or floating-point) duration. So I guess there's no
need to change it.


>-    struct __timed_waiters : __waiters
>+    struct __timed_waiters : __waiters_base
>     {
>-      template<typename _Clock, typename _Duration>
>-	__atomic_wait_status
>-	_M_do_wait_until(__platform_wait_t __version,
>-			 const chrono::time_point<_Clock, _Duration>& __atime)
>+      // returns true if wait ended before timeout
>+      template<typename _Clock, typename _Dur>
>+	bool
>+	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
>+			 const chrono::time_point<_Clock, _Dur>& __atime)
> 	{
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-	  return __detail::__platform_wait_until(&_M_ver, __version, __atime);
>+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>+	  return __platform_wait_until(__addr, __old, __atime);
> #else
>-	  __platform_wait_t __cur = 0;
>-	  __waiters::__lock_t __l(_M_mtx);
>-	  while (__cur <= __version)
>+	  __platform_wait_t __val;
>+	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	  if (__val == __old)
> 	    {
>-	      if (__detail::__cond_wait_until(_M_cv, _M_mtx, __atime)
>-		    == __atomic_wait_status::timeout)
>-		return __atomic_wait_status::timeout;
>-
>-	      __platform_wait_t __last = __cur;
>-	      __atomic_load(&_M_ver, &__cur, __ATOMIC_ACQUIRE);
>-	      if (__cur < __last)
>-		break; // break the loop if version overflows
>+	      lock_guard<mutex>__l(_M_mtx);

Missing space before the __l name.

>@@ -184,115 +238,238 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> #endif
>       }
>
>-      static __waiters&
>-      _S_for(const void* __t)
>+      static __waiters_base&
>+      _S_for(const void* __addr)

This can be noexcept.

>       {
>-	const unsigned char __mask = 0xf;
>-	static __waiters __w[__mask + 1];
>-
>-	auto __key = _Hash_impl::hash(__t) & __mask;
>+	constexpr uintptr_t __ct = 16;
>+	static __waiters_base __w[__ct];
>+	auto __key = (uintptr_t(__addr) >> 2) % __ct;
> 	return __w[__key];
>       }
>     };
>
>-    struct __waiter
>+    struct __waiters : __waiters_base
>     {
>-      __waiters& _M_w;
>-      __platform_wait_t _M_version;
>-
>-      template<typename _Tp>
>-	__waiter(const _Tp* __addr) noexcept
>-	  : _M_w(__waiters::_S_for(static_cast<const void*>(__addr)))
>-	  , _M_version(_M_w._M_enter_wait())
>-	{ }
>-
>-      ~__waiter()
>-      { _M_w._M_leave_wait(); }
>-
>-      void _M_do_wait() noexcept
>-      { _M_w._M_do_wait(_M_version); }
>+      void
>+      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
>+      {
>+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>+	__platform_wait(__addr, __old);
>+#else
>+	__platform_wait_t __val;
>+	__atomic_load(_M_addr, &__val, __ATOMIC_RELAXED);
>+	if (__val == __old)
>+	  {
>+	    lock_guard<mutex> __l(_M_mtx);
>+	    _M_cv.wait(_M_mtx);
>+	  }
>+#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
>+      }
>     };
>
>-    inline void
>-    __thread_relax() noexcept
>-    {
>-#if defined __i386__ || defined __x86_64__
>-      __builtin_ia32_pause();
>-#elif defined _GLIBCXX_USE_SCHED_YIELD
>-      __gthread_yield();
>-#endif
>-    }
>+    template<typename _Tp, typename _EntersWait>
>+      struct __waiter_base
>+      {
>+	using __waiter_type = _Tp;
>
>-    inline void
>-    __thread_yield() noexcept
>-    {
>-#if defined _GLIBCXX_USE_SCHED_YIELD
>-     __gthread_yield();
>-#endif
>-    }
>+	__waiter_type& _M_w;
>+	__platform_wait_t* _M_addr;
>
>+	template<typename _Up>
>+	  static __platform_wait_t*
>+	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
>+	  {
>+	    if constexpr (__platform_wait_uses_type<_Up>)
>+	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
>+	    else
>+	      return __b;
>+	  }
>+
>+	template<typename _Up>
>+	  static __waiter_type&
>+	  _S_for(const _Up* __addr)

Why is this a function template? It doesn't depend on _Up at all. It
just casts the _Up* to void* so might as well take a void* parameter,
no?

>+	  {
>+	    static_assert(sizeof(__waiter_type) == sizeof(__waiters_base));
>+	    auto& res = __waiters_base::_S_for(static_cast<const void*>(__addr));
>+	    return reinterpret_cast<__waiter_type&>(res);
>+	  }
>+
>+	template<typename _Up>
>+	  explicit __waiter_base(const _Up* __addr) noexcept
>+	    : _M_w(_S_for(__addr))
>+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
>+	  {
>+	    if constexpr (_EntersWait::value)
>+	      _M_w._M_enter_wait();
>+	  }
>+
>+	template<typename _Up>
>+	  __waiter_base(const _Up* __addr, std::false_type) noexcept

This constructor doesn't seem to be used anywhere.

>+	    : _M_w(_S_for(__addr))
>+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
>+	  { }
>+
>+	~__waiter_base()
>+	{
>+	  if constexpr (_EntersWait::value)
>+	    _M_w._M_leave_wait();
>+	}
>+
>+	void
>+	_M_notify(bool __all)
>+	{
>+	  if (_M_addr == &_M_w._M_ver)
>+	    __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
>+	  _M_w._M_notify(_M_addr, __all);
>+	}
>+
>+	template<typename _Up, typename _ValFn,
>+		 typename _Spin = __default_spin_policy>
>+	  static bool
>+	  _S_do_spin_v(__platform_wait_t* __addr,
>+		       const _Up& __old, _ValFn __vfn,
>+		       __platform_wait_t& __val,
>+		       _Spin __spin = _Spin{ })
>+	  {
>+	    auto const __pred = [=]
>+	      { return __atomic_compare(__old, __vfn()); };
>+
>+	    if constexpr (__platform_wait_uses_type<_Up>)
>+	      {
>+		__val == __old;
>+	      }
>+	    else
>+	      {
>+		__atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	      }
>+	    return __atomic_spin(__pred, __spin);
>+	  }
>+
>+	template<typename _Up, typename _ValFn,
>+		 typename _Spin = __default_spin_policy>
>+	  bool
>+	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
>+		       __platform_wait_t& __val,
>+		       _Spin __spin = _Spin{ })
>+	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
>+
>+	template<typename _Pred,
>+		 typename _Spin = __default_spin_policy>
>+	  static bool
>+	  _S_do_spin(const __platform_wait_t* __addr,
>+		     _Pred __pred,
>+		     __platform_wait_t& __val,
>+		     _Spin __spin = _Spin{ })
>+	  {
>+	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
>+	    return __atomic_spin(__pred, __spin);
>+	  }
>+
>+	template<typename _Pred,
>+		 typename _Spin = __default_spin_policy>
>+	  bool
>+	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
>+	             _Spin __spin = _Spin{ })
>+	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
>+      };
>+
>+    template<typename _EntersWait>
>+      struct __waiter : __waiter_base<__waiters, _EntersWait>
>+      {
>+	using __base_type = __waiter_base<__waiters, _EntersWait>;

Why does the base class depend on _EntersWait? That causes all the
code in the base to be duplicated for the two specializations (true
and false). The only parts that differ are the constructor and
destructor, so the derived class could do that, couldn't it?

i.e. have

     template<typename _Tp>
       struct __waiter_base

as the base, then __waiter<_EntersWait> does the _M_enter_wait and
_M_leave_wait calls in its ctor and dtor.

That way we only instantiate two specializations of the base,
__waiter_base<__waiters> and __waiter_base<__timed_waiters>, rather
than four.





>   template<typename _Tp>
>     void
>-    __atomic_notify(const _Tp* __addr, bool __all) noexcept
>+    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
>     {
>-      using namespace __detail;
>-      auto& __w = __waiters::_S_for((void*)__addr);
>-      if (!__w._M_waiting())
>-	return;
>-
>-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>-      if constexpr (__platform_wait_uses_type<_Tp>)
>-	{
>-	  __platform_notify((__platform_wait_t*)(void*) __addr, __all);
>-	}
>-      else
>-#endif
>-	{
>-	  __w._M_notify(__all);
>-	}
>+      __detail::__bare_wait __w(__addr);

Should this be __enters_wait not __bare_wait ?

>+      __w._M_notify(__all);
>     }
>+
>+  // This call is to be used by atomic types which track contention externally
>+  inline void
>+  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
>+			       bool __all) noexcept
>+  {
>+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
>+    __detail::__platform_notify(__addr, __all);
>+#else
>+    __detail::__bare_wait __w(__addr);
>+    __w._M_notify(__all);
>+#endif
>+  }
> _GLIBCXX_END_NAMESPACE_VERSION
> } // namespace std
> #endif // GTHREADS || LINUX_FUTEX
>diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
>index b65717e64d7..c21624e0988 100644
>--- a/libstdc++-v3/include/bits/semaphore_base.h
>+++ b/libstdc++-v3/include/bits/semaphore_base.h

[snip]

>-    private:
>-      alignas(__alignof__(_Tp)) _Tp _M_counter;
>-    };
>+  private:
>+    __detail::__platform_wait_t _M_counter;

We still need to force the alignment here.

Jakub said on IRC that m68k might have alignof(int) == 2, so we need
to increase that alignment to 4 to use it as a futex.

For the case where __platform_wait_t is int, we want alignas(4) but I
suppose on a hypothetical platform where we use a 64-bit type as
__platform_wait_t that would be wrong.

Maybe we want a new constant defined alongside the __platform_wait_t
which specifies the requried alignment, then use:

   alignas(__detail::__platform_wait_alignment) __detail::__platform_wait_t
     _M_counter;

Or use alignas(atomic_ref<__platform_wait_t>::required_alignment).
diff mbox series

Patch

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index f24a5489e8e..d651e040cf5 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -195,6 +195,7 @@  bits_headers = \
 	${bits_srcdir}/std_function.h \
 	${bits_srcdir}/std_mutex.h \
 	${bits_srcdir}/std_thread.h \
+	${bits_srcdir}/std_thread_sleep.h \
 	${bits_srcdir}/stl_algo.h \
 	${bits_srcdir}/stl_algobase.h \
 	${bits_srcdir}/stl_bvector.h \
diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 2dc00676054..2e46691c59a 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -235,22 +235,21 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     wait(bool __old,
 	memory_order __m = memory_order_seq_cst) const noexcept
     {
-      std::__atomic_wait(&_M_i, static_cast<__atomic_flag_data_type>(__old),
-			 [__m, this, __old]()
-			 { return this->test(__m) != __old; });
+      std::__atomic_wait_address_v(&_M_i, static_cast<__atomic_flag_data_type>(__old),
+			 [__m, this] { return this->test(__m); });
     }
 
     // TODO add const volatile overload
 
     _GLIBCXX_ALWAYS_INLINE void
     notify_one() const noexcept
-    { std::__atomic_notify(&_M_i, false); }
+    { std::__atomic_notify_address(&_M_i, false); }
 
     // TODO add const volatile overload
 
     _GLIBCXX_ALWAYS_INLINE void
     notify_all() const noexcept
-    { std::__atomic_notify(&_M_i, true); }
+    { std::__atomic_notify_address(&_M_i, true); }
 
     // TODO add const volatile overload
 #endif // __cpp_lib_atomic_wait
@@ -609,22 +608,21 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       wait(__int_type __old,
 	  memory_order __m = memory_order_seq_cst) const noexcept
       {
-	std::__atomic_wait(&_M_i, __old,
-			   [__m, this, __old]
-			   { return this->load(__m) != __old; });
+	std::__atomic_wait_address_v(&_M_i, __old,
+			   [__m, this] { return this->load(__m); });
       }
 
       // TODO add const volatile overload
 
       _GLIBCXX_ALWAYS_INLINE void
       notify_one() const noexcept
-      { std::__atomic_notify(&_M_i, false); }
+      { std::__atomic_notify_address(&_M_i, false); }
 
       // TODO add const volatile overload
 
       _GLIBCXX_ALWAYS_INLINE void
       notify_all() const noexcept
-      { std::__atomic_notify(&_M_i, true); }
+      { std::__atomic_notify_address(&_M_i, true); }
 
       // TODO add const volatile overload
 #endif // __cpp_lib_atomic_wait
@@ -903,22 +901,22 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       wait(__pointer_type __old,
 	   memory_order __m = memory_order_seq_cst) noexcept
       {
-	std::__atomic_wait(&_M_p, __old,
-		      [__m, this, __old]()
-		      { return this->load(__m) != __old; });
+	std::__atomic_wait_address_v(&_M_p, __old,
+				     [__m, this]
+				     { return this->load(__m); });
       }
 
       // TODO add const volatile overload
 
       _GLIBCXX_ALWAYS_INLINE void
       notify_one() const noexcept
-      { std::__atomic_notify(&_M_p, false); }
+      { std::__atomic_notify_address(&_M_p, false); }
 
       // TODO add const volatile overload
 
       _GLIBCXX_ALWAYS_INLINE void
       notify_all() const noexcept
-      { std::__atomic_notify(&_M_p, true); }
+      { std::__atomic_notify_address(&_M_p, true); }
 
       // TODO add const volatile overload
 #endif // __cpp_lib_atomic_wait
@@ -1017,8 +1015,8 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       wait(const _Tp* __ptr, _Val<_Tp> __old,
 	   memory_order __m = memory_order_seq_cst) noexcept
       {
-	std::__atomic_wait(__ptr, __old,
-	    [=]() { return load(__ptr, __m) == __old; });
+	std::__atomic_wait_address_v(__ptr, __old,
+	    [__ptr, __m]() { return load(__ptr, __m); });
       }
 
       // TODO add const volatile overload
@@ -1026,14 +1024,14 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     template<typename _Tp>
       _GLIBCXX_ALWAYS_INLINE void
       notify_one(const _Tp* __ptr) noexcept
-      { std::__atomic_notify(__ptr, false); }
+      { std::__atomic_notify_address(__ptr, false); }
 
       // TODO add const volatile overload
 
     template<typename _Tp>
       _GLIBCXX_ALWAYS_INLINE void
       notify_all(const _Tp* __ptr) noexcept
-      { std::__atomic_notify(__ptr, true); }
+      { std::__atomic_notify_address(__ptr, true); }
 
       // TODO add const volatile overload
 #endif // __cpp_lib_atomic_wait
diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
index a0c5ef4374e..4b876236d2b 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -36,6 +36,7 @@ 
 
 #if __cpp_lib_atomic_wait
 #include <bits/functional_hash.h>
+#include <bits/std_thread_sleep.h>
 
 #include <chrono>
 
@@ -48,19 +49,34 @@  namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-  enum class __atomic_wait_status { no_timeout, timeout };
-
   namespace __detail
   {
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-    using __platform_wait_clock_t = chrono::steady_clock;
+    using __wait_clock_t = chrono::steady_clock;
 
-    template<typename _Duration>
-      __atomic_wait_status
-      __platform_wait_until_impl(__platform_wait_t* __addr,
-				 __platform_wait_t __val,
-				 const chrono::time_point<
-					  __platform_wait_clock_t, _Duration>&
+    template<typename _Clock, typename _Dur>
+      __wait_clock_t::time_point
+      __to_wait_clock(const chrono::time_point<_Clock, _Dur>& __atime) noexcept
+      {
+	const typename _Clock::time_point __c_entry = _Clock::now();
+	const __wait_clock_t::time_point __s_entry = __wait_clock_t::now();
+	const auto __delta = __atime - __c_entry;
+	return __s_entry + __delta;
+      }
+
+    template<typename _Dur>
+      __wait_clock_t::time_point
+      __to_wait_clock(const chrono::time_point<__wait_clock_t,
+					       _Dur>& __atime) noexcept
+      { return __atime; }
+
+#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
+#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+    // returns true if wait ended before timeout
+    template<typename _Dur>
+      bool
+      __platform_wait_until_impl(const __platform_wait_t* __addr,
+				 __platform_wait_t __old,
+				 const chrono::time_point<__wait_clock_t, _Dur>&
 				      __atime) noexcept
       {
 	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
@@ -75,52 +91,55 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	auto __e = syscall (SYS_futex, __addr,
 			    static_cast<int>(__futex_wait_flags::
 						__wait_bitset_private),
-			    __val, &__rt, nullptr,
+			    __old, &__rt, nullptr,
 			    static_cast<int>(__futex_wait_flags::
 						__bitset_match_any));
-	if (__e && !(errno == EINTR || errno == EAGAIN || errno == ETIMEDOUT))
-	    std::terminate();
-	return (__platform_wait_clock_t::now() < __atime)
-	       ? __atomic_wait_status::no_timeout
-	       : __atomic_wait_status::timeout;
+
+	if (__e)
+	  {
+	    if ((errno != ETIMEDOUT) && (errno != EINTR)
+		&& (errno != EAGAIN))
+	      __throw_system_error(errno);
+	    return true;
+	  }
+	return false;
       }
 
-    template<typename _Clock, typename _Duration>
-      __atomic_wait_status
-      __platform_wait_until(__platform_wait_t* __addr, __platform_wait_t __val,
-			    const chrono::time_point<_Clock, _Duration>&
-				__atime)
+    // returns true if wait ended before timeout
+    template<typename _Clock, typename _Dur>
+      bool
+      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
+			    const chrono::time_point<_Clock, _Dur>& __atime)
       {
-	if constexpr (is_same_v<__platform_wait_clock_t, _Clock>)
+	if constexpr (is_same_v<__wait_clock_t, _Clock>)
 	  {
-	    return __detail::__platform_wait_until_impl(__addr, __val, __atime);
+	    return __platform_wait_until_impl(__addr, __old, __atime);
 	  }
 	else
 	  {
-	    const typename _Clock::time_point __c_entry = _Clock::now();
-	    const __platform_wait_clock_t::time_point __s_entry =
-		    __platform_wait_clock_t::now();
-	    const auto __delta = __atime - __c_entry;
-	    const auto __s_atime = __s_entry + __delta;
-	    if (__detail::__platform_wait_until_impl(__addr, __val, __s_atime)
-		  == __atomic_wait_status::no_timeout)
-	      return __atomic_wait_status::no_timeout;
-
-	    // We got a timeout when measured against __clock_t but
-	    // we need to check against the caller-supplied clock
-	    // to tell whether we should return a timeout.
-	    if (_Clock::now() < __atime)
-	      return __atomic_wait_status::no_timeout;
-	    return __atomic_wait_status::timeout;
+	    if (!__platform_wait_until_impl(__addr, __old,
+					    __to_wait_clock(__atime)))
+	      {
+		// We got a timeout when measured against __clock_t but
+		// we need to check against the caller-supplied clock
+		// to tell whether we should return a timeout.
+		if (_Clock::now() < __atime)
+		  return true;
+	      }
+	    return false;
 	  }
       }
-#else // ! FUTEX
+#else
+// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until()
+// if there is a more efficient primitive supported by the platform
+// (e.g. __ulock_wait())which is better than pthread_cond_clockwait
+#endif // ! PLATFORM_TIMED_WAIT
 
-#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-    template<typename _Duration>
-      __atomic_wait_status
+    // returns true if wait ended before timeout
+    template<typename _Dur>
+      bool
       __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<chrono::steady_clock, _Duration>& __atime)
+	  const chrono::time_point<chrono::steady_clock, _Dur>& __atime)
       {
 	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
 	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
@@ -131,40 +150,20 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    static_cast<long>(__ns.count())
 	  };
 
+#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
 	__cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
-
-	return (chrono::steady_clock::now() < __atime)
-	       ? __atomic_wait_status::no_timeout
-	       : __atomic_wait_status::timeout;
-      }
-#endif
-
-    template<typename _Duration>
-      __atomic_wait_status
-      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<chrono::system_clock, _Duration>& __atime)
-      {
-	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
-	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
-
-	__gthread_time_t __ts =
-	{
-	  static_cast<std::time_t>(__s.time_since_epoch().count()),
-	  static_cast<long>(__ns.count())
-	};
-
+	return chrono::steady_clock::now() < __atime;
+#else
 	__cv.wait_until(__mx, __ts);
-
-	return (chrono::system_clock::now() < __atime)
-	       ? __atomic_wait_status::no_timeout
-	       : __atomic_wait_status::timeout;
+	return chrono::system_clock::now() < __atime;
+#endif // ! _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
       }
 
-    // return true if timeout
-    template<typename _Clock, typename _Duration>
-      __atomic_wait_status
+    // returns true if wait ended before timeout
+    template<typename _Clock, typename _Dur>
+      bool
       __cond_wait_until(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<_Clock, _Duration>& __atime)
+	  const chrono::time_point<_Clock, _Dur>& __atime)
       {
 #ifndef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
 	using __clock_t = chrono::system_clock;
@@ -178,118 +177,255 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
 	else
 	  {
-	    const typename _Clock::time_point __c_entry = _Clock::now();
-	    const __clock_t::time_point __s_entry = __clock_t::now();
-	    const auto __delta = __atime - __c_entry;
-	    const auto __s_atime = __s_entry + __delta;
-	    if (__detail::__cond_wait_until_impl(__cv, __mx, __s_atime)
-		== __atomic_wait_status::no_timeout)
-	      return __atomic_wait_status::no_timeout;
-	    // We got a timeout when measured against __clock_t but
-	    // we need to check against the caller-supplied clock
-	    // to tell whether we should return a timeout.
-	    if (_Clock::now() < __atime)
-	      return __atomic_wait_status::no_timeout;
-	    return __atomic_wait_status::timeout;
+	    if (__cond_wait_until_impl(__cv, __mx,
+				       __to_wait_clock(__atime)))
+	      {
+		// We got a timeout when measured against __clock_t but
+		// we need to check against the caller-supplied clock
+		// to tell whether we should return a timeout.
+		if (_Clock::now() < __atime)
+		  return true;
+	      }
+	    return false;
 	  }
       }
-#endif // FUTEX
 
-    struct __timed_waiters : __waiters
+    struct __timed_waiters : __waiters_base
     {
-      template<typename _Clock, typename _Duration>
-	__atomic_wait_status
-	_M_do_wait_until(__platform_wait_t __version,
-			 const chrono::time_point<_Clock, _Duration>& __atime)
+      // returns true if wait ended before timeout
+      template<typename _Clock, typename _Dur>
+	bool
+	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
+			 const chrono::time_point<_Clock, _Dur>& __atime)
 	{
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	  return __detail::__platform_wait_until(&_M_ver, __version, __atime);
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	  return __platform_wait_until(__addr, __old, __atime);
 #else
-	  __platform_wait_t __cur = 0;
-	  __waiters::__lock_t __l(_M_mtx);
-	  while (__cur <= __version)
+	  __platform_wait_t __val;
+	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+	  if (__val == __old)
 	    {
-	      if (__detail::__cond_wait_until(_M_cv, _M_mtx, __atime)
-		    == __atomic_wait_status::timeout)
-		return __atomic_wait_status::timeout;
-
-	      __platform_wait_t __last = __cur;
-	      __atomic_load(&_M_ver, &__cur, __ATOMIC_ACQUIRE);
-	      if (__cur < __last)
-		break; // break the loop if version overflows
+	      lock_guard<mutex>__l(_M_mtx);
+	      return __cond_wait_until(_M_cv, _M_mtx, __atime);
 	    }
-	  return __atomic_wait_status::no_timeout;
-#endif
+#endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
 	}
+    };
 
-      static __timed_waiters&
-      _S_timed_for(void* __t)
+    struct __timed_backoff_spin_policy
+    {
+      __wait_clock_t::time_point _M_deadline;
+      __wait_clock_t::time_point _M_t0;
+
+      template<typename _Clock, typename _Dur>
+	__timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur>
+				      __deadline = _Clock::time_point::max(),
+				    chrono::time_point<_Clock, _Dur>
+				      __t0 = _Clock::now()) noexcept
+	  : _M_deadline(__to_wait_clock(__deadline))
+	  , _M_t0(__to_wait_clock(__t0))
+	{ }
+
+      bool
+      operator()() const noexcept
       {
-	static_assert(sizeof(__timed_waiters) == sizeof(__waiters));
-	return static_cast<__timed_waiters&>(__waiters::_S_for(__t));
+	using namespace literals::chrono_literals;
+	auto __now = __wait_clock_t::now();
+	if (_M_deadline <= __now)
+	  return false;
+
+	auto __elapsed = __now - _M_t0;
+	if (__elapsed > 128ms)
+	  {
+	    this_thread::sleep_for(64ms);
+	  }
+	else if (__elapsed > 64us)
+	  {
+	    this_thread::sleep_for(__elapsed / 2);
+	  }
+	else if (__elapsed > 4us)
+	  {
+	    __thread_yield();
+	  }
+	else
+	  return false;
       }
     };
+
+    template<typename _EntersWait>
+      struct __timed_waiter : __waiter_base<__timed_waiters, _EntersWait>
+      {
+	using __base_type = __waiter_base<__timed_waiters, _EntersWait>;
+
+	template<typename _Tp>
+	  __timed_waiter(const _Tp* __addr) noexcept
+	  : __base_type(__addr)
+	{ }
+
+	// returns true if wait ended before timeout
+	template<typename _Tp, typename _ValFn,
+		 typename _Clock, typename _Dur>
+	  bool
+	  _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
+			     const chrono::time_point<_Clock, _Dur>&
+								__atime) noexcept
+	  {
+	    __platform_wait_t __val;
+	    if (_M_do_spin(__old, std::move(__vfn), __val,
+			   __timed_backoff_spin_policy(__atime)))
+	      return true;
+	    return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
+	  }
+
+	// returns true if wait ended before timeout
+	template<typename _Pred,
+		 typename _Clock, typename _Dur>
+	  bool
+	  _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
+			  const chrono::time_point<_Clock, _Dur>&
+							      __atime) noexcept
+	  {
+	    for (auto __now = _Clock::now(); __now < __atime;
+		  __now = _Clock::now())
+	      {
+		if (__base_type::_M_w._M_do_wait_until(
+		      __base_type::_M_addr, __val, __atime)
+		    && __pred())
+		  return true;
+
+		if (__base_type::_M_do_spin(__pred, __val,
+			       __timed_backoff_spin_policy(__atime, __now)))
+		  return true;
+	      }
+	    return false;
+	  }
+
+	// returns true if wait ended before timeout
+	template<typename _Pred,
+		 typename _Clock, typename _Dur>
+	  bool
+	  _M_do_wait_until(_Pred __pred,
+			   const chrono::time_point<_Clock, _Dur>&
+								__atime) noexcept
+	  {
+	    __platform_wait_t __val;
+	    if (__base_type::_M_do_spin(__pred, __val,
+				        __timed_backoff_spin_policy(__atime)))
+	      return true;
+	    return _M_do_wait_until(__pred, __val, __atime);
+	  }
+
+	template<typename _Tp, typename _ValFn,
+		 typename _Rep, typename _Period>
+	  bool
+	  _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
+			   const chrono::duration<_Rep, _Period>&
+								__rtime) noexcept
+	  {
+	    __platform_wait_t __val;
+	    if (_M_do_spin_v(__old, std::move(__vfn), __val))
+	      return true;
+
+	    if (!__rtime.count())
+	      return false; // no rtime supplied, and spin did not acquire
+
+	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
+
+	    return __base_type::_M_w._M_do_wait_until(
+					  __base_type::_M_addr,
+					  __val,
+					  chrono::steady_clock::now() + __reltime);
+	  }
+
+	template<typename _Pred,
+		 typename _Rep, typename _Period>
+	  bool
+	  _M_do_wait_for(_Pred __pred,
+			 const chrono::duration<_Rep, _Period>& __rtime) noexcept
+	  {
+	    __platform_wait_t __val;
+	    if (__base_type::_M_do_spin(__pred, __val))
+	      return true;
+
+	    if (!__rtime.count())
+	      return false; // no rtime supplied, and spin did not acquire
+
+	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
+
+	    return _M_do_wait_until(__pred, __val,
+				    chrono::steady_clock::now() + __reltime);
+	  }
+      };
+
+    using __enters_timed_wait = __timed_waiter<std::true_type>;
+    using __bare_timed_wait = __timed_waiter<std::false_type>;
   } // namespace __detail
 
-  template<typename _Tp, typename _Pred,
-	   typename _Clock, typename _Duration>
+  // returns true if wait ended before timeout
+  template<typename _Tp, typename _ValFn,
+	   typename _Clock, typename _Dur>
     bool
-    __atomic_wait_until(const _Tp* __addr, _Tp __old, _Pred __pred,
-			const chrono::time_point<_Clock, _Duration>&
+    __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
+			const chrono::time_point<_Clock, _Dur>&
 			    __atime) noexcept
     {
-      using namespace __detail;
+      __detail::__enters_timed_wait __w{__addr};
+      return __w._M_do_wait_until_v(__old, __vfn, __atime);
+    }
 
-      if (std::__atomic_spin(__pred))
-	return true;
+  template<typename _Tp, typename _Pred,
+	   typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
+				const chrono::time_point<_Clock, _Dur>&
+							      __atime) noexcept
+    {
+      __detail::__enters_timed_wait __w{__addr};
+      return __w._M_do_wait_until(__pred, __atime);
+    }
 
-      auto& __w = __timed_waiters::_S_timed_for((void*)__addr);
-      auto __version = __w._M_enter_wait();
-      do
-	{
-	  __atomic_wait_status __res;
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	  if constexpr (__platform_wait_uses_type<_Tp>)
-	    {
-	      __res = __detail::__platform_wait_until((__platform_wait_t*)(void*) __addr,
-						      __old, __atime);
-	    }
-	  else
-#endif
-	    {
-	      __res = __w._M_do_wait_until(__version, __atime);
-	    }
-	  if (__res == __atomic_wait_status::timeout)
-	    return false;
-	}
-      while (!__pred() && __atime < _Clock::now());
-      __w._M_leave_wait();
+  template<typename _Pred,
+	   typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until_bare(const __detail::__platform_wait_t* __addr,
+				_Pred __pred,
+				const chrono::time_point<_Clock, _Dur>&
+							      __atime) noexcept
+    {
+      __detail::__bare_timed_wait __w{__addr};
+      return __w._M_do_wait_until(__pred, __atime);
+    }
 
-      // if timed out, return false
-      return (_Clock::now() < __atime);
+  template<typename _Tp, typename _ValFn,
+	   typename _Rep, typename _Period>
+    bool
+    __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
+		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
+    {
+      __detail::__enters_timed_wait __w{__addr};
+      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
     }
 
   template<typename _Tp, typename _Pred,
 	   typename _Rep, typename _Period>
     bool
-    __atomic_wait_for(const _Tp* __addr, _Tp __old, _Pred __pred,
+    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
 		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
     {
-      using namespace __detail;
 
-      if (std::__atomic_spin(__pred))
-	return true;
+      __detail::__enters_timed_wait __w{__addr};
+      return __w._M_do_wait_for(__pred, __rtime);
+    }
 
-      if (!__rtime.count())
-	return false; // no rtime supplied, and spin did not acquire
-
-      using __dur = chrono::steady_clock::duration;
-      auto __reltime = chrono::duration_cast<__dur>(__rtime);
-      if (__reltime < __rtime)
-	++__reltime;
-
-      return __atomic_wait_until(__addr, __old, std::move(__pred),
-				 chrono::steady_clock::now() + __reltime);
+  template<typename _Pred,
+	   typename _Rep, typename _Period>
+    bool
+    __atomic_wait_address_for_bare(const __detail::__platform_wait_t* __addr,
+			_Pred __pred,
+			const chrono::duration<_Rep, _Period>& __rtime) noexcept
+    {
+      __detail::__bare_timed_wait __w{__addr};
+      return __w._M_do_wait_for(__pred, __rtime);
     }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 1a0f0943ebd..9b69cf88a52 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -44,12 +44,10 @@ 
 # include <unistd.h>
 # include <syscall.h>
 # include <bits/functexcept.h>
-// TODO get this from Autoconf
-# define _GLIBCXX_HAVE_LINUX_FUTEX_PRIVATE 1
-#else
-# include <bits/std_mutex.h>  // std::mutex, std::__condvar
 #endif
 
+# include <bits/std_mutex.h>  // std::mutex, std::__condvar
+
 #define __cpp_lib_atomic_wait 201907L
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -57,20 +55,27 @@  namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
-    using __platform_wait_t = int;
-
-    constexpr auto __atomic_spin_count_1 = 16;
-    constexpr auto __atomic_spin_count_2 = 12;
-
-    template<typename _Tp>
-      inline constexpr bool __platform_wait_uses_type
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	= is_same_v<remove_cv_t<_Tp>, __platform_wait_t>;
+    using __platform_wait_t = int;
 #else
-	= false;
+    using __platform_wait_t = uint64_t;
+#endif
+  } // namespace __detail
+
+  template<typename _Tp>
+    inline constexpr bool __platform_wait_uses_type
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      = is_scalar_v<_Tp>
+	&& ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
+	&& (alignof(_Tp*) >= alignof(__detail::__platform_wait_t)));
+#else
+      = false;
 #endif
 
+  namespace __detail
+  {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
+#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
     enum class __futex_wait_flags : int
     {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX_PRIVATE
@@ -93,16 +98,13 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       void
       __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
       {
-	for(;;)
-	  {
-	    auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
-				  static_cast<int>(__futex_wait_flags::__wait_private),
-				    __val, nullptr);
-	    if (!__e || errno == EAGAIN)
-	      break;
-	    else if (errno != EINTR)
-	      __throw_system_error(__e);
-	  }
+	auto __e = syscall (SYS_futex, static_cast<const void*>(__addr),
+			    static_cast<int>(__futex_wait_flags::__wait_private),
+			    __val, nullptr);
+	if (!__e || errno == EAGAIN)
+	  return;
+	if (errno != EINTR)
+	  __throw_system_error(errno);
       }
 
     template<typename _Tp>
@@ -110,72 +112,124 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       __platform_notify(const _Tp* __addr, bool __all) noexcept
       {
 	syscall (SYS_futex, static_cast<const void*>(__addr),
-		  static_cast<int>(__futex_wait_flags::__wake_private),
-		    __all ? INT_MAX : 1);
+		 static_cast<int>(__futex_wait_flags::__wake_private),
+		 __all ? INT_MAX : 1);
       }
+#else
+// define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
+// and __platform_notify() if there is a more efficient primitive supported
+// by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
+// a mutex/condvar based wait
 #endif
 
-    struct __waiters
+    inline void
+    __thread_yield() noexcept
     {
-      alignas(64) __platform_wait_t _M_ver = 0;
-      alignas(64) __platform_wait_t _M_wait = 0;
+#if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
+     __gthread_yield();
+#endif
+    }
 
-#ifndef _GLIBCXX_HAVE_LINUX_FUTEX
-      using __lock_t = lock_guard<mutex>;
-      mutex _M_mtx;
-      __condvar _M_cv;
+    inline void
+    __thread_relax() noexcept
+    {
+#if defined __i386__ || defined __x86_64__
+      __builtin_ia32_pause();
+#else
+      __thread_yield();
+#endif
+    }
 
-      __waiters() noexcept = default;
+    constexpr auto __atomic_spin_count_1 = 12;
+    constexpr auto __atomic_spin_count_2 = 4;
+
+    struct __default_spin_policy
+    {
+      bool
+      operator()() const noexcept
+      { return false; }
+    };
+
+    template<typename _Pred,
+	     typename _Spin = __default_spin_policy>
+      bool
+      __atomic_spin(_Pred& __pred, _Spin __spin = _Spin{ }) noexcept
+      {
+	for (auto __i = 0; __i < __atomic_spin_count_1; ++__i)
+	  {
+	    if (__pred())
+	      return true;
+	    __detail::__thread_relax();
+	  }
+
+	for (auto __i = 0; __i < __atomic_spin_count_2; ++__i)
+	  {
+	    if (__pred())
+	      return true;
+	    __detail::__thread_yield();
+	  }
+
+	while (__spin())
+	  {
+	    if (__pred())
+	      return true;
+	  }
+
+	return false;
+      }
+
+    template<typename _Tp>
+      bool __atomic_compare(const _Tp& __a, const _Tp& __b)
+      {
+	// TODO make this do the correct padding bit ignoring comparison
+	return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) != 0;
+      }
+
+    struct __waiters_base
+    {
+#ifdef __cpp_lib_hardware_interference_size
+    static constexpr auto _S_align = hardware_destructive_interference_size;
+#else
+    static constexpr auto _S_align = 64;
 #endif
 
-      __platform_wait_t
+      alignas(_S_align) __platform_wait_t _M_wait = 0;
+
+#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
+      mutex _M_mtx;
+#endif
+
+      alignas(_S_align) __platform_wait_t _M_ver = 0;
+
+#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __condvar _M_cv;
+#endif
+      __waiters_base() = default;
+
+      void
       _M_enter_wait() noexcept
-      {
-	__platform_wait_t __res;
-	__atomic_load(&_M_ver, &__res, __ATOMIC_ACQUIRE);
-	__atomic_fetch_add(&_M_wait, 1, __ATOMIC_ACQ_REL);
-	return __res;
-      }
+      { __atomic_fetch_add(&_M_wait, 1, __ATOMIC_ACQ_REL); }
 
       void
       _M_leave_wait() noexcept
-      {
-	__atomic_fetch_sub(&_M_wait, 1, __ATOMIC_ACQ_REL);
-      }
-
-      void
-      _M_do_wait(__platform_wait_t __version) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	__platform_wait(&_M_ver, __version);
-#else
-	__platform_wait_t __cur = 0;
-	while (__cur <= __version)
-	  {
-	    __waiters::__lock_t __l(_M_mtx);
-	    _M_cv.wait(_M_mtx);
-	    __platform_wait_t __last = __cur;
-	    __atomic_load(&_M_ver, &__cur, __ATOMIC_ACQUIRE);
-	    if (__cur < __last)
-	      break; // break the loop if version overflows
-	  }
-#endif
-      }
+      { __atomic_fetch_sub(&_M_wait, 1, __ATOMIC_ACQ_REL); }
 
       bool
       _M_waiting() const noexcept
       {
 	__platform_wait_t __res;
 	__atomic_load(&_M_wait, &__res, __ATOMIC_ACQUIRE);
-	return __res;
+	return __res > 0;
       }
 
       void
-      _M_notify(bool __all) noexcept
+      _M_notify(const __platform_wait_t* __addr, bool __all) noexcept
       {
-	__atomic_fetch_add(&_M_ver, 1, __ATOMIC_ACQ_REL);
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-	__platform_notify(&_M_ver, __all);
+	if (!_M_waiting())
+	  return;
+
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	__platform_notify(__addr, __all);
 #else
 	if (__all)
 	  _M_cv.notify_all();
@@ -184,115 +238,238 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
       }
 
-      static __waiters&
-      _S_for(const void* __t)
+      static __waiters_base&
+      _S_for(const void* __addr)
       {
-	const unsigned char __mask = 0xf;
-	static __waiters __w[__mask + 1];
-
-	auto __key = _Hash_impl::hash(__t) & __mask;
+	constexpr uintptr_t __ct = 16;
+	static __waiters_base __w[__ct];
+	auto __key = (uintptr_t(__addr) >> 2) % __ct;
 	return __w[__key];
       }
     };
 
-    struct __waiter
+    struct __waiters : __waiters_base
     {
-      __waiters& _M_w;
-      __platform_wait_t _M_version;
-
-      template<typename _Tp>
-	__waiter(const _Tp* __addr) noexcept
-	  : _M_w(__waiters::_S_for(static_cast<const void*>(__addr)))
-	  , _M_version(_M_w._M_enter_wait())
-	{ }
-
-      ~__waiter()
-      { _M_w._M_leave_wait(); }
-
-      void _M_do_wait() noexcept
-      { _M_w._M_do_wait(_M_version); }
+      void
+      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
+      {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	__platform_wait(__addr, __old);
+#else
+	__platform_wait_t __val;
+	__atomic_load(_M_addr, &__val, __ATOMIC_RELAXED);
+	if (__val == __old)
+	  {
+	    lock_guard<mutex> __l(_M_mtx);
+	    _M_cv.wait(_M_mtx);
+	  }
+#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
+      }
     };
 
-    inline void
-    __thread_relax() noexcept
-    {
-#if defined __i386__ || defined __x86_64__
-      __builtin_ia32_pause();
-#elif defined _GLIBCXX_USE_SCHED_YIELD
-      __gthread_yield();
-#endif
-    }
+    template<typename _Tp, typename _EntersWait>
+      struct __waiter_base
+      {
+	using __waiter_type = _Tp;
 
-    inline void
-    __thread_yield() noexcept
-    {
-#if defined _GLIBCXX_USE_SCHED_YIELD
-     __gthread_yield();
-#endif
-    }
+	__waiter_type& _M_w;
+	__platform_wait_t* _M_addr;
 
+	template<typename _Up>
+	  static __platform_wait_t*
+	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
+	  {
+	    if constexpr (__platform_wait_uses_type<_Up>)
+	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
+	    else
+	      return __b;
+	  }
+
+	template<typename _Up>
+	  static __waiter_type&
+	  _S_for(const _Up* __addr)
+	  {
+	    static_assert(sizeof(__waiter_type) == sizeof(__waiters_base));
+	    auto& res = __waiters_base::_S_for(static_cast<const void*>(__addr));
+	    return reinterpret_cast<__waiter_type&>(res);
+	  }
+
+	template<typename _Up>
+	  explicit __waiter_base(const _Up* __addr) noexcept
+	    : _M_w(_S_for(__addr))
+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
+	  {
+	    if constexpr (_EntersWait::value)
+	      _M_w._M_enter_wait();
+	  }
+
+	template<typename _Up>
+	  __waiter_base(const _Up* __addr, std::false_type) noexcept
+	    : _M_w(_S_for(__addr))
+	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
+	  { }
+
+	~__waiter_base()
+	{
+	  if constexpr (_EntersWait::value)
+	    _M_w._M_leave_wait();
+	}
+
+	void
+	_M_notify(bool __all)
+	{
+	  if (_M_addr == &_M_w._M_ver)
+	    __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
+	  _M_w._M_notify(_M_addr, __all);
+	}
+
+	template<typename _Up, typename _ValFn,
+		 typename _Spin = __default_spin_policy>
+	  static bool
+	  _S_do_spin_v(__platform_wait_t* __addr,
+		       const _Up& __old, _ValFn __vfn,
+		       __platform_wait_t& __val,
+		       _Spin __spin = _Spin{ })
+	  {
+	    auto const __pred = [=]
+	      { return __atomic_compare(__old, __vfn()); };
+
+	    if constexpr (__platform_wait_uses_type<_Up>)
+	      {
+		__val == __old;
+	      }
+	    else
+	      {
+		__atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+	      }
+	    return __atomic_spin(__pred, __spin);
+	  }
+
+	template<typename _Up, typename _ValFn,
+		 typename _Spin = __default_spin_policy>
+	  bool
+	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
+		       __platform_wait_t& __val,
+		       _Spin __spin = _Spin{ })
+	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
+
+	template<typename _Pred,
+		 typename _Spin = __default_spin_policy>
+	  static bool
+	  _S_do_spin(const __platform_wait_t* __addr,
+		     _Pred __pred,
+		     __platform_wait_t& __val,
+		     _Spin __spin = _Spin{ })
+	  {
+	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+	    return __atomic_spin(__pred, __spin);
+	  }
+
+	template<typename _Pred,
+		 typename _Spin = __default_spin_policy>
+	  bool
+	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
+	             _Spin __spin = _Spin{ })
+	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
+      };
+
+    template<typename _EntersWait>
+      struct __waiter : __waiter_base<__waiters, _EntersWait>
+      {
+	using __base_type = __waiter_base<__waiters, _EntersWait>;
+
+	template<typename _Tp>
+	  explicit __waiter(const _Tp* __addr) noexcept
+	    : __base_type(__addr)
+	  { }
+
+	template<typename _Tp, typename _ValFn>
+	  void
+	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
+	  {
+	    __platform_wait_t __val;
+	    if (__base_type::_M_do_spin_v(__old, __vfn, __val))
+	      return;
+	    __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
+	  }
+
+	template<typename _Pred>
+	  void
+	  _M_do_wait(_Pred __pred) noexcept
+	  {
+	    do
+	      {
+		__platform_wait_t __val;
+		if (__base_type::_M_do_spin(__pred, __val))
+		  return;
+		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
+	      }
+	    while (!__pred());
+	  }
+      };
+
+    using __enters_wait = __waiter<std::true_type>;
+    using __bare_wait = __waiter<std::false_type>;
   } // namespace __detail
 
-  template<typename _Pred>
-    bool
-    __atomic_spin(_Pred& __pred) noexcept
+  template<typename _Tp, typename _ValFn>
+    void
+    __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
+			    _ValFn __vfn) noexcept
     {
-      for (auto __i = 0; __i < __detail::__atomic_spin_count_1; ++__i)
-	{
-	  if (__pred())
-	    return true;
-
-	  if (__i < __detail::__atomic_spin_count_2)
-	    __detail::__thread_relax();
-	  else
-	    __detail::__thread_yield();
-	}
-      return false;
+      __detail::__enters_wait __w(__addr);
+      __w._M_do_wait_v(__old, __vfn);
     }
 
   template<typename _Tp, typename _Pred>
     void
-    __atomic_wait(const _Tp* __addr, _Tp __old, _Pred __pred) noexcept
+    __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
     {
-      using namespace __detail;
-      if (std::__atomic_spin(__pred))
-	return;
+      __detail::__enters_wait __w(__addr);
+      __w._M_do_wait(__pred);
+    }
 
-      __waiter __w(__addr);
-      while (!__pred())
+  // This call is to be used by atomic types which track contention externally
+  template<typename _Pred>
+    void
+    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
+			       _Pred __pred) noexcept
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      do
 	{
-	  if constexpr (__platform_wait_uses_type<_Tp>)
-	    {
-	      __platform_wait(__addr, __old);
-	    }
-	  else
-	    {
-	      // TODO support timed backoff when this can be moved into the lib
-	      __w._M_do_wait();
-	    }
+	  __detail::__platform_wait_t __val;
+	  if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
+	    return;
+	  __detail::__platform_wait(__addr, __val);
 	}
+      while (!__pred());
+#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
+      __detail::__bare_wait __w(__addr);
+      __w._M_do_wait(__pred);
+#endif
     }
 
   template<typename _Tp>
     void
-    __atomic_notify(const _Tp* __addr, bool __all) noexcept
+    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
     {
-      using namespace __detail;
-      auto& __w = __waiters::_S_for((void*)__addr);
-      if (!__w._M_waiting())
-	return;
-
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-      if constexpr (__platform_wait_uses_type<_Tp>)
-	{
-	  __platform_notify((__platform_wait_t*)(void*) __addr, __all);
-	}
-      else
-#endif
-	{
-	  __w._M_notify(__all);
-	}
+      __detail::__bare_wait __w(__addr);
+      __w._M_notify(__all);
     }
+
+  // This call is to be used by atomic types which track contention externally
+  inline void
+  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
+			       bool __all) noexcept
+  {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+    __detail::__platform_notify(__addr, __all);
+#else
+    __detail::__bare_wait __w(__addr);
+    __w._M_notify(__all);
+#endif
+  }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // GTHREADS || LINUX_FUTEX
diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
index b65717e64d7..c21624e0988 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -35,8 +35,8 @@ 
 #include <bits/atomic_base.h>
 #if __cpp_lib_atomic_wait
 #include <bits/atomic_timed_wait.h>
-
 #include <ext/numeric_traits.h>
+#endif // __cpp_lib_atomic_wait
 
 #ifdef _GLIBCXX_HAVE_POSIX_SEMAPHORE
 # include <limits.h>
@@ -164,138 +164,100 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 #endif // _GLIBCXX_HAVE_POSIX_SEMAPHORE
 
-  template<typename _Tp>
-    struct __atomic_semaphore
+#if __cpp_lib_atomic_wait
+  struct __atomic_semaphore
+  {
+    static constexpr ptrdiff_t _S_max = __gnu_cxx::__int_traits<int>::__max;
+    explicit __atomic_semaphore(__detail::__platform_wait_t __count) noexcept
+      : _M_counter(__count)
     {
-      static_assert(std::is_integral_v<_Tp>);
-      static_assert(__gnu_cxx::__int_traits<_Tp>::__max
-		      <= __gnu_cxx::__int_traits<ptrdiff_t>::__max);
-      static constexpr ptrdiff_t _S_max = __gnu_cxx::__int_traits<_Tp>::__max;
+      __glibcxx_assert(__count >= 0 && __count <= _S_max);
+    }
 
-      explicit __atomic_semaphore(_Tp __count) noexcept
-	: _M_counter(__count)
+    __atomic_semaphore(const __atomic_semaphore&) = delete;
+    __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
+
+    static _GLIBCXX_ALWAYS_INLINE bool
+    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
+		      __detail::__platform_wait_t& __old) noexcept
+    {
+      if (__old == 0)
+	return false;
+
+      return __atomic_impl::compare_exchange_strong(__counter,
+						    __old, __old - 1,
+						    memory_order::acquire,
+						    memory_order::release);
+    }
+
+    _GLIBCXX_ALWAYS_INLINE void
+    _M_acquire() noexcept
+    {
+      auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
+      auto const __pred =
+	[this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+      std::__atomic_wait_address_bare(&_M_counter, __pred);
+    }
+
+    bool
+    _M_try_acquire() noexcept
+    {
+      auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
+      auto const __pred =
+	[this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+      return std::__detail::__atomic_spin(__pred);
+    }
+
+    template<typename _Clock, typename _Duration>
+      _GLIBCXX_ALWAYS_INLINE bool
+      _M_try_acquire_until(const chrono::time_point<_Clock,
+			   _Duration>& __atime) noexcept
       {
-	__glibcxx_assert(__count >= 0 && __count <= _S_max);
-      }
-
-      __atomic_semaphore(const __atomic_semaphore&) = delete;
-      __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
-
-      _GLIBCXX_ALWAYS_INLINE void
-      _M_acquire() noexcept
-      {
-	auto const __pred = [this]
-	  {
-	    auto __old = __atomic_impl::load(&this->_M_counter,
-			    memory_order::acquire);
-	    if (__old == 0)
-	      return false;
-	    return __atomic_impl::compare_exchange_strong(&this->_M_counter,
-		      __old, __old - 1,
-		      memory_order::acquire,
-		      memory_order::release);
-	  };
 	auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
-	std::__atomic_wait(&_M_counter, __old, __pred);
+	auto const __pred =
+	  [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+
+	return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
       }
 
-      bool
-      _M_try_acquire() noexcept
+    template<typename _Rep, typename _Period>
+      _GLIBCXX_ALWAYS_INLINE bool
+      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
+	noexcept
       {
-	auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
-	auto const __pred = [this, __old]
-	  {
-	    if (__old == 0)
-	      return false;
+	auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
+	auto const __pred =
+	  [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
 
-	    auto __prev = __old;
-	    return __atomic_impl::compare_exchange_weak(&this->_M_counter,
-		      __prev, __prev - 1,
-		      memory_order::acquire,
-		      memory_order::release);
-	  };
-	return std::__atomic_spin(__pred);
+	return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
       }
 
-      template<typename _Clock, typename _Duration>
-	_GLIBCXX_ALWAYS_INLINE bool
-	_M_try_acquire_until(const chrono::time_point<_Clock,
-			     _Duration>& __atime) noexcept
-	{
-	  auto const __pred = [this]
-	    {
-	      auto __old = __atomic_impl::load(&this->_M_counter,
-			      memory_order::acquire);
-	      if (__old == 0)
-		return false;
-	      return __atomic_impl::compare_exchange_strong(&this->_M_counter,
-			      __old, __old - 1,
-			      memory_order::acquire,
-			      memory_order::release);
-	    };
+    _GLIBCXX_ALWAYS_INLINE void
+    _M_release(ptrdiff_t __update) noexcept
+    {
+      if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
+	return;
+      if (__update > 1)
+	__atomic_notify_address_bare(&_M_counter, true);
+      else
+	__atomic_notify_address_bare(&_M_counter, false);
+    }
 
-	  auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
-	  return __atomic_wait_until(&_M_counter, __old, __pred, __atime);
-	}
-
-      template<typename _Rep, typename _Period>
-	_GLIBCXX_ALWAYS_INLINE bool
-	_M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
-	  noexcept
-	{
-	  auto const __pred = [this]
-	    {
-	      auto __old = __atomic_impl::load(&this->_M_counter,
-			      memory_order::acquire);
-	      if (__old == 0)
-		return false;
-	      return  __atomic_impl::compare_exchange_strong(&this->_M_counter,
-			      __old, __old - 1,
-			      memory_order::acquire,
-			      memory_order::release);
-	    };
-
-	  auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
-	  return __atomic_wait_for(&_M_counter, __old, __pred, __rtime);
-	}
-
-      _GLIBCXX_ALWAYS_INLINE void
-      _M_release(ptrdiff_t __update) noexcept
-      {
-	if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
-	  return;
-	if (__update > 1)
-	  __atomic_impl::notify_all(&_M_counter);
-	else
-	  __atomic_impl::notify_one(&_M_counter);
-      }
-
-    private:
-      alignas(__alignof__(_Tp)) _Tp _M_counter;
-    };
+  private:
+    __detail::__platform_wait_t _M_counter;
+  };
+#endif // __cpp_lib_atomic_wait
 
 // Note: the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro can be used to force the
 // use of Posix semaphores (sem_t). Doing so however, alters the ABI.
-#if defined _GLIBCXX_HAVE_LINUX_FUTEX && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE
-  // Use futex if available and didn't force use of POSIX
-  using __fast_semaphore = __atomic_semaphore<__detail::__platform_wait_t>;
+#if defined __cpp_lib_atomic_wait && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE
+  using __semaphore_impl = __atomic_semaphore;
 #elif _GLIBCXX_HAVE_POSIX_SEMAPHORE
-  using __fast_semaphore = __platform_semaphore;
+  using __semaphore_impl = __platform_semaphore;
 #else
-  using __fast_semaphore = __atomic_semaphore<ptrdiff_t>;
+#  error "No suitable semaphore implementation available"
 #endif
 
-template<ptrdiff_t __least_max_value>
-  using __semaphore_impl = conditional_t<
-		(__least_max_value > 1),
-		conditional_t<
-		    (__least_max_value <= __fast_semaphore::_S_max),
-		    __fast_semaphore,
-		    __atomic_semaphore<ptrdiff_t>>,
-		__fast_semaphore>;
-
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
-
-#endif // __cpp_lib_atomic_wait
 #endif // _GLIBCXX_SEMAPHORE_BASE_H
diff --git a/libstdc++-v3/include/bits/std_thread_sleep.h b/libstdc++-v3/include/bits/std_thread_sleep.h
new file mode 100644
index 00000000000..545bff2aea3
--- /dev/null
+++ b/libstdc++-v3/include/bits/std_thread_sleep.h
@@ -0,0 +1,119 @@ 
+// std::this_thread::sleep_for/until declarations -*- C++ -*-
+
+// Copyright (C) 2008-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+/** @file bits/std_thread_sleep.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{thread}
+ */
+
+#ifndef _GLIBCXX_THREAD_SLEEP_H
+#define _GLIBCXX_THREAD_SLEEP_H 1
+
+#pragma GCC system_header
+
+#if __cplusplus >= 201103L
+#include <bits/c++config.h>
+
+#include <chrono> // std::chrono::*
+
+#ifdef _GLIBCXX_USE_NANOSLEEP
+# include <cerrno>  // errno, EINTR
+# include <time.h>  // nanosleep
+#endif
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /** @addtogroup threads
+   *  @{
+   */
+
+  /** @namespace std::this_thread
+   *  @brief ISO C++ 2011 namespace for interacting with the current thread
+   *
+   *  C++11 30.3.2 [thread.thread.this] Namespace this_thread.
+   */
+  namespace this_thread
+  {
+#ifndef _GLIBCXX_NO_SLEEP
+
+#ifndef _GLIBCXX_USE_NANOSLEEP
+    void
+    __sleep_for(chrono::seconds, chrono::nanoseconds);
+#endif
+
+    /// this_thread::sleep_for
+    template<typename _Rep, typename _Period>
+      inline void
+      sleep_for(const chrono::duration<_Rep, _Period>& __rtime)
+      {
+	if (__rtime <= __rtime.zero())
+	  return;
+	auto __s = chrono::duration_cast<chrono::seconds>(__rtime);
+	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__rtime - __s);
+#ifdef _GLIBCXX_USE_NANOSLEEP
+	struct ::timespec __ts =
+	  {
+	    static_cast<std::time_t>(__s.count()),
+	    static_cast<long>(__ns.count())
+	  };
+	while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
+	  { }
+#else
+	__sleep_for(__s, __ns);
+#endif
+      }
+
+    /// this_thread::sleep_until
+    template<typename _Clock, typename _Duration>
+      inline void
+      sleep_until(const chrono::time_point<_Clock, _Duration>& __atime)
+      {
+#if __cplusplus > 201703L
+	static_assert(chrono::is_clock_v<_Clock>);
+#endif
+	auto __now = _Clock::now();
+	if (_Clock::is_steady)
+	  {
+	    if (__now < __atime)
+	      sleep_for(__atime - __now);
+	    return;
+	  }
+	while (__now < __atime)
+	  {
+	    sleep_for(__atime - __now);
+	    __now = _Clock::now();
+	  }
+      }
+  } // namespace this_thread
+#endif // ! NO_SLEEP
+
+  /// @}
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+#endif // C++11
+
+#endif // _GLIBCXX_THREAD_SLEEP_H
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index de5591d8e14..a56da8a9683 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -384,26 +384,19 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     wait(_Tp __old, memory_order __m = memory_order_seq_cst) const noexcept
     {
-      std::__atomic_wait(&_M_i, __old,
-			 [__m, this, __old]
-			 {
-			   const auto __v = this->load(__m);
-			   // TODO make this ignore padding bits when we
-			   // can do that
-			   return __builtin_memcmp(&__old, &__v,
-						    sizeof(_Tp)) != 0;
-			 });
+      std::__atomic_wait_address_v(&_M_i, __old,
+			 [__m, this] { return this->load(__m); });
     }
 
     // TODO add const volatile overload
 
     void
     notify_one() const noexcept
-    { std::__atomic_notify(&_M_i, false); }
+    { std::__atomic_notify_address(&_M_i, false); }
 
     void
     notify_all() const noexcept
-    { std::__atomic_notify(&_M_i, true); }
+    { std::__atomic_notify_address(&_M_i, true); }
 #endif // __cpp_lib_atomic_wait 
 
     };
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index e09212dfcb9..1f21fa759d0 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -94,7 +94,7 @@  It looks different from literature pseudocode for two main reasons:
       alignas(__phase_alignment) __barrier_phase_t  _M_phase;
 
       bool
-      _M_arrive(__barrier_phase_t __old_phase)
+      _M_arrive(__barrier_phase_t __old_phase, size_t __current)
       {
 	const auto __old_phase_val = static_cast<unsigned char>(__old_phase);
 	const auto __half_step =
@@ -104,8 +104,7 @@  It looks different from literature pseudocode for two main reasons:
 
 	size_t __current_expected = _M_expected;
 	std::hash<std::thread::id> __hasher;
-	size_t __current = __hasher(std::this_thread::get_id())
-					  % ((_M_expected + 1) >> 1);
+	__current %= ((_M_expected + 1) >> 1);
 
 	for (int __round = 0; ; ++__round)
 	  {
@@ -163,12 +162,14 @@  It looks different from literature pseudocode for two main reasons:
       [[nodiscard]] arrival_token
       arrive(ptrdiff_t __update)
       {
+	std::hash<std::thread::id> __hasher;
+	size_t __current = __hasher(std::this_thread::get_id());
 	__atomic_phase_ref_t __phase(_M_phase);
 	const auto __old_phase = __phase.load(memory_order_relaxed);
 	const auto __cur = static_cast<unsigned char>(__old_phase);
 	for(; __update; --__update)
 	  {
-	    if(_M_arrive(__old_phase))
+	    if(_M_arrive(__old_phase, __current))
 	      {
 		_M_completion();
 		_M_expected += _M_expected_adjustment.load(memory_order_relaxed);
@@ -185,11 +186,11 @@  It looks different from literature pseudocode for two main reasons:
       wait(arrival_token&& __old_phase) const
       {
 	__atomic_phase_const_ref_t __phase(_M_phase);
-	auto const __test_fn = [=, this]
+	auto const __test_fn = [=]
 	  {
 	    return __phase.load(memory_order_acquire) != __old_phase;
 	  };
-	std::__atomic_wait(&_M_phase, __old_phase, __test_fn);
+	std::__atomic_wait_address(&_M_phase, __test_fn);
       }
 
       void
diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index ef8c301e5e9..20b75f8181a 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -48,7 +48,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public:
     static constexpr ptrdiff_t
     max() noexcept
-    { return __gnu_cxx::__int_traits<ptrdiff_t>::__max; }
+    { return __gnu_cxx::__int_traits<__detail::__platform_wait_t>::__max; }
 
     constexpr explicit latch(ptrdiff_t __expected) noexcept
       : _M_a(__expected) { }
@@ -73,8 +73,8 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     wait() const noexcept
     {
-      auto const __old = __atomic_impl::load(&_M_a, memory_order::acquire);
-      std::__atomic_wait(&_M_a, __old, [this] { return this->try_wait(); });
+      auto const __pred = [this] { return this->try_wait(); };
+      std::__atomic_wait_address(&_M_a, __pred);
     }
 
     _GLIBCXX_ALWAYS_INLINE void
@@ -85,7 +85,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
   private:
-    alignas(__alignof__(ptrdiff_t)) ptrdiff_t _M_a;
+    alignas(__alignof__(__detail::__platform_wait_t)) __detail::__platform_wait_t _M_a;
   };
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/include/std/semaphore b/libstdc++-v3/include/std/semaphore
index 40af41b44d9..02a8214e569 100644
--- a/libstdc++-v3/include/std/semaphore
+++ b/libstdc++-v3/include/std/semaphore
@@ -33,8 +33,6 @@ 
 
 #if __cplusplus > 201703L
 #include <bits/semaphore_base.h>
-#if __cpp_lib_atomic_wait
-#include <ext/numeric_traits.h>
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -42,13 +40,13 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #define __cpp_lib_semaphore 201907L
 
-  template<ptrdiff_t __least_max_value =
-			__gnu_cxx::__int_traits<ptrdiff_t>::__max>
+  template<ptrdiff_t __least_max_value = __semaphore_impl::_S_max>
     class counting_semaphore
     {
       static_assert(__least_max_value >= 0);
+      static_assert(__least_max_value <= __semaphore_impl::_S_max);
 
-      __semaphore_impl<__least_max_value> _M_sem;
+      __semaphore_impl _M_sem;
 
     public:
       explicit counting_semaphore(ptrdiff_t __desired) noexcept
@@ -91,6 +89,5 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-#endif // __cpp_lib_atomic_wait
 #endif // C++20
 #endif // _GLIBCXX_SEMAPHORE
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index ad383395ee9..63c0f38a83c 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -35,19 +35,13 @@ 
 # include <bits/c++0x_warning.h>
 #else
 
-#include <chrono> // std::chrono::*
-
 #if __cplusplus > 201703L
 # include <compare>	// std::strong_ordering
 # include <stop_token>	// std::stop_source, std::stop_token, std::nostopstate
 #endif
 
 #include <bits/std_thread.h> // std::thread, get_id, yield
-
-#ifdef _GLIBCXX_USE_NANOSLEEP
-# include <cerrno>  // errno, EINTR
-# include <time.h>  // nanosleep
-#endif
+#include <bits/std_thread_sleep.h> // std::this_thread::sleep_for, sleep_until
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -103,66 +97,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __out << __id._M_thread;
     }
 
-  /** @namespace std::this_thread
-   *  @brief ISO C++ 2011 namespace for interacting with the current thread
-   *
-   *  C++11 30.3.2 [thread.thread.this] Namespace this_thread.
-   */
-  namespace this_thread
-  {
-#ifndef _GLIBCXX_NO_SLEEP
-
-#ifndef _GLIBCXX_USE_NANOSLEEP
-    void
-    __sleep_for(chrono::seconds, chrono::nanoseconds);
-#endif
-
-    /// this_thread::sleep_for
-    template<typename _Rep, typename _Period>
-      inline void
-      sleep_for(const chrono::duration<_Rep, _Period>& __rtime)
-      {
-	if (__rtime <= __rtime.zero())
-	  return;
-	auto __s = chrono::duration_cast<chrono::seconds>(__rtime);
-	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__rtime - __s);
-#ifdef _GLIBCXX_USE_NANOSLEEP
-	struct ::timespec __ts =
-	  {
-	    static_cast<std::time_t>(__s.count()),
-	    static_cast<long>(__ns.count())
-	  };
-	while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
-	  { }
-#else
-	__sleep_for(__s, __ns);
-#endif
-      }
-
-    /// this_thread::sleep_until
-    template<typename _Clock, typename _Duration>
-      inline void
-      sleep_until(const chrono::time_point<_Clock, _Duration>& __atime)
-      {
-#if __cplusplus > 201703L
-	static_assert(chrono::is_clock_v<_Clock>);
-#endif
-	auto __now = _Clock::now();
-	if (_Clock::is_steady)
-	  {
-	    if (__now < __atime)
-	      sleep_for(__atime - __now);
-	    return;
-	  }
-	while (__now < __atime)
-	  {
-	    sleep_for(__atime - __now);
-	    __now = _Clock::now();
-	  }
-      }
-  } // namespace this_thread
-#endif // ! NO_SLEEP
-
 #ifdef __cpp_lib_jthread
 
   /// A thread that can be requested to stop and automatically joined.
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
index 0550f17c69d..26a7dfbfcec 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
@@ -22,42 +22,21 @@ 
 
 #include <atomic>
 #include <thread>
-#include <mutex>
-#include <condition_variable>
-#include <type_traits>
-#include <chrono>
 
 #include <testsuite_hooks.h>
 
 int
 main ()
 {
-  using namespace std::literals::chrono_literals;
-
-  std::mutex m;
-  std::condition_variable cv;
-  std::unique_lock<std::mutex> l(m);
-
-  std::atomic<bool> a(false);
-  std::atomic<bool> b(false);
+  std::atomic<bool> a{ true };
+  VERIFY( a.load() );
+  a.wait(false);
   std::thread t([&]
-		{
-		  {
-		    // This ensures we block until cv.wait(l) starts.
-		    std::lock_guard<std::mutex> ll(m);
-		  }
-		  cv.notify_one();
-		  a.wait(false);
-		  if (a.load())
-		    {
-		      b.store(true);
-		    }
-		});
-  cv.wait(l);
-  std::this_thread::sleep_for(100ms);
-  a.store(true);
-  a.notify_one();
+    {
+      a.store(false);
+      a.notify_one();
+    });
+  a.wait(true);
   t.join();
-  VERIFY( b.load() );
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
index 9ab1b071c96..0f1b9cd69d2 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
@@ -20,12 +20,27 @@ 
 // with this library; see the file COPYING3.  If not see
 // <http://www.gnu.org/licenses/>.
 
-#include "atomic/wait_notify_util.h"
+#include <atomic>
+#include <thread>
+
+#include <testsuite_hooks.h>
 
 int
 main ()
 {
   struct S{ int i; };
-  check<S> check_s{S{0},S{42}};
+  S aa{ 0 };
+  S bb{ 42 };
+
+  std::atomic<S> a{ aa };
+  VERIFY( a.load().i == aa.i );
+  a.wait(bb);
+  std::thread t([&]
+    {
+      a.store(bb);
+      a.notify_one();
+    });
+  a.wait(aa);
+  t.join();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
index cc63694f596..17365a17228 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
@@ -22,42 +22,24 @@ 
 
 #include <atomic>
 #include <thread>
-#include <mutex>
-#include <condition_variable>
-#include <type_traits>
-#include <chrono>
 
 #include <testsuite_hooks.h>
 
 int
 main ()
 {
-  using namespace std::literals::chrono_literals;
-
-  std::mutex m;
-  std::condition_variable cv;
-  std::unique_lock<std::mutex> l(m);
-
   long aa;
   long bb;
-
-  std::atomic<long*> a(nullptr);
+  std::atomic<long*> a(&aa);
+  VERIFY( a.load() == &aa );
+  a.wait(&bb);
   std::thread t([&]
-		{
-		  {
-		    // This ensures we block until cv.wait(l) starts.
-		    std::lock_guard<std::mutex> ll(m);
-		  }
-		  cv.notify_one();
-		  a.wait(nullptr);
-		  if (a.load() == &aa)
-		    a.store(&bb);
-		});
-  cv.wait(l);
-  std::this_thread::sleep_for(100ms);
-  a.store(&aa);
-  a.notify_one();
+    {
+      a.store(&bb);
+      a.notify_one();
+    });
+  a.wait(&aa);
   t.join();
-  VERIFY( a.load() == &bb);
+
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
index 45b68c5bbb8..9d12889ed59 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
@@ -21,10 +21,6 @@ 
 // <http://www.gnu.org/licenses/>.
 
 #include <atomic>
-#include <chrono>
-#include <condition_variable>
-#include <concepts>
-#include <mutex>
 #include <thread>
 
 #include <testsuite_hooks.h>
@@ -32,34 +28,15 @@ 
 int
 main()
 {
-  using namespace std::literals::chrono_literals;
-
-  std::mutex m;
-  std::condition_variable cv;
-  std::unique_lock<std::mutex> l(m);
-
   std::atomic_flag a;
-  std::atomic_flag b;
+  VERIFY( !a.test() );
+  a.wait(true);
   std::thread t([&]
-		{
-		  {
-		    // This ensures we block until cv.wait(l) starts.
-		    std::lock_guard<std::mutex> ll(m);
-		  }
-		  cv.notify_one();
-		  a.wait(false);
-		  b.test_and_set();
-		  b.notify_one();
-		});
-
-  cv.wait(l);
-  std::this_thread::sleep_for(100ms);
-  a.test_and_set();
-  a.notify_one();
-  b.wait(false);
+    {
+      a.test_and_set();
+      a.notify_one();
+    });
+  a.wait(false);
   t.join();
-
-  VERIFY( a.test() );
-  VERIFY( b.test() );
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc b/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
index d8ec5fbe24e..01768da290b 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
@@ -21,12 +21,32 @@ 
 // with this library; see the file COPYING3.  If not see
 // <http://www.gnu.org/licenses/>.
 
-#include "atomic/wait_notify_util.h"
+
+#include <atomic>
+#include <thread>
+
+#include <testsuite_hooks.h>
+
+template<typename Tp>
+  void
+  check()
+  {
+    std::atomic<Tp> a{ 1.0 };
+    VERIFY( a.load() != 0.0 );
+    a.wait( 0.0 );
+    std::thread t([&]
+      {
+        a.store(0.0);
+        a.notify_one();
+      });
+    a.wait(1.0);
+    t.join();
+  }
 
 int
 main ()
 {
-  check<float> f;
-  check<double> d;
+  check<float>();
+  check<double>();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc b/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
index 19c1ec4bc12..d1bf0811602 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
@@ -21,46 +21,57 @@ 
 // with this library; see the file COPYING3.  If not see
 // <http://www.gnu.org/licenses/>.
 
-#include "atomic/wait_notify_util.h"
 
-void
-test01()
-{
-  struct S{ int i; };
-  std::atomic<S> s;
+#include <atomic>
+#include <thread>
 
-  s.wait(S{42});
-}
+#include <testsuite_hooks.h>
+
+template<typename Tp>
+  void
+  check()
+  {
+    std::atomic<Tp> a{ Tp(1) };
+    VERIFY( a.load() == Tp(1) );
+    a.wait( Tp(0) );
+    std::thread t([&]
+      {
+        a.store(Tp(0));
+        a.notify_one();
+      });
+    a.wait(Tp(1));
+    t.join();
+  }
 
 int
 main ()
 {
   // check<bool> bb;
-  check<char> ch;
-  check<signed char> sch;
-  check<unsigned char> uch;
-  check<short> s;
-  check<unsigned short> us;
-  check<int> i;
-  check<unsigned int> ui;
-  check<long> l;
-  check<unsigned long> ul;
-  check<long long> ll;
-  check<unsigned long long> ull;
+  check<char>();
+  check<signed char>();
+  check<unsigned char>();
+  check<short>();
+  check<unsigned short>();
+  check<int>();
+  check<unsigned int>();
+  check<long>();
+  check<unsigned long>();
+  check<long long>();
+  check<unsigned long long>();
 
-  check<wchar_t> wch;
-  check<char8_t> ch8;
-  check<char16_t> ch16;
-  check<char32_t> ch32;
+  check<wchar_t>();
+  check<char8_t>();
+  check<char16_t>();
+  check<char32_t>();
 
-  check<int8_t> i8;
-  check<int16_t> i16;
-  check<int32_t> i32;
-  check<int64_t> i64;
+  check<int8_t>();
+  check<int16_t>();
+  check<int32_t>();
+  check<int64_t>();
 
-  check<uint8_t> u8;
-  check<uint16_t> u16;
-  check<uint32_t> u32;
-  check<uint64_t> u64;
+  check<uint8_t>();
+  check<uint16_t>();
+  check<uint32_t>();
+  check<uint64_t>();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
index a6740857172..2fd31304222 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -23,73 +23,25 @@ 
 
 #include <atomic>
 #include <thread>
-#include <mutex>
-#include <condition_variable>
-#include <chrono>
-#include <type_traits>
 
 #include <testsuite_hooks.h>
 
-template<typename Tp>
-Tp check_wait_notify(Tp val1, Tp val2)
-{
-  using namespace std::literals::chrono_literals;
-
-  std::mutex m;
-  std::condition_variable cv;
-  std::unique_lock<std::mutex> l(m);
-
-  Tp aa = val1;
-  std::atomic_ref<Tp> a(aa);
-  std::thread t([&]
-		{
-		  {
-		    // This ensures we block until cv.wait(l) starts.
-		    std::lock_guard<std::mutex> ll(m);
-		  }
-		  cv.notify_one();
-		  a.wait(val1);
-		  if (a.load() != val2)
-		    a = val1;
-		});
-  cv.wait(l);
-  std::this_thread::sleep_for(100ms);
-  a.store(val2);
-  a.notify_one();
-  t.join();
-  return a.load();
-}
-
-template<typename Tp,
-	 bool = std::is_integral_v<Tp>
-	 || std::is_floating_point_v<Tp>>
-struct check;
-
-template<typename Tp>
-struct check<Tp, true>
-{
-  check()
-  {
-    Tp a = 0;
-    Tp b = 42;
-    VERIFY(check_wait_notify(a, b) == b);
-  }
-};
-
-template<typename Tp>
-struct check<Tp, false>
-{
-  check(Tp b)
-  {
-    Tp a;
-    VERIFY(check_wait_notify(a, b) == b);
-  }
-};
-
 int
 main ()
 {
-  check<long>();
-  check<double>();
+  struct S{ int i; };
+  S aa{ 0 };
+  S bb{ 42 };
+
+  std::atomic_ref<S> a{ aa };
+  VERIFY( a.load().i == aa.i );
+  a.wait(bb);
+  std::thread t([&]
+    {
+      a.store(bb);
+      a.notify_one();
+    });
+  a.wait(aa);
+  t.join();
   return 0;
 }