Make futex functions free functions instead of members

Summary: The current futex API required a reference to a futex object in order to invoke `futexWake()`, although this is not buggy by itself, it is technically UB and nothing is stopping ASAN from catching this sort of use-after-free and reporting it as an error. Especially when the futex is represented as a pointer, requiring a dereference to reach a member function The bug can come up when you call `futexWake()` on a futex that has been destroyed, for example ``` auto&& futex_ptr = std::atomic<Futex<>*>{nullptr}; auto&& thread = std::thread{[&]() { auto&& futex = Futex<>{0}; futex_ptr.store(&futex); while (futex.load(std::memory_order_relaxed) != 1) { futex.futexWait(0); } }}; while (!futex_ptr.load()) {} futex_ptr.load()->store(1); futex_ptr.load()->futexWake(1); thread.join(); ``` Here immediately after the `store(1)`, our thread could have loaded the value, seen that it had changed, and never went to sleep. Or it could have seen the value as 0, went to sleep and immediately returned when it saw that the value in the futex word was not what was expected. In the scenario described above calling `futexWake()` is done on a "dangling" pointer. To avoid this, we just never dereference the pointer, and pass the pointer to the futex syscall, where it will do the right things A side benefit to the refactor is that adding specializations is very easy. And we don't have to mess with member function specializations anymore, which are inherently hard to work with (eg. cannot partially specialize) The ADL extension points (currently implemented for `Futex<std::atomic>`, `Futex<DeterministicAtomic>` and `Futex<EmulatedFutexAtomic>`) are ``` int futexWakeImpl(FutexType* futex, int count, uint32_t wakeMask); FutexResult futexWaitImpl( FutexType* futex, uint32_t expected, std::chrono::system_clock::time_point const* absSystemTime, std::chrono::steady_clock::time_point const* absSteadyTime, uint32_t waitMask); ``` Reviewed By: yfeldblum Differential Revision: D9376527 fbshipit-source-id: bb2b54e511fdf1da992c630a9bc7dc37f76da641

Make futex functions free functions instead of members
Summary: The current futex API required a reference to a futex object in order to invoke `futexWake()`, although this is not buggy by itself, it is technically UB and nothing is stopping ASAN from catching this sort of use-after-free and reporting it as an error. Especially when the futex is represented as a pointer, requiring a dereference to reach a member function The bug can come up when you call `futexWake()` on a futex that has been destroyed, for example ``` auto&& futex_ptr = std::atomic<Futex<>*>{nullptr}; auto&& thread = std::thread{[&]() { auto&& futex = Futex<>{0}; futex_ptr.store(&futex); while (futex.load(std::memory_order_relaxed) != 1) { futex.futexWait(0); } }}; while (!futex_ptr.load()) {} futex_ptr.load()->store(1); futex_ptr.load()->futexWake(1); thread.join(); ``` Here immediately after the `store(1)`, our thread could have loaded the value, seen that it had changed, and never went to sleep. Or it could have seen the value as 0, went to sleep and immediately returned when it saw that the value in the futex word was not what was expected. In the scenario described above calling `futexWake()` is done on a "dangling" pointer. To avoid this, we just never dereference the pointer, and pass the pointer to the futex syscall, where it will do the right things A side benefit to the refactor is that adding specializations is very easy. And we don't have to mess with member function specializations anymore, which are inherently hard to work with (eg. cannot partially specialize) The ADL extension points (currently implemented for `Futex<std::atomic>`, `Futex<DeterministicAtomic>` and `Futex<EmulatedFutexAtomic>`) are ``` int futexWakeImpl(FutexType* futex, int count, uint32_t wakeMask); FutexResult futexWaitImpl( FutexType* futex, uint32_t expected, std::chrono::system_clock::time_point const* absSystemTime, std::chrono::steady_clock::time_point const* absSteadyTime, uint32_t waitMask); ``` Reviewed By: yfeldblum Differential Revision: D9376527 fbshipit-source-id: bb2b54e511fdf1da992c630a9bc7dc37f76da641
4d234b99 · Aaryaman Sagar · Facebook Github Bot · 2f002209 · 4d234b99 · 4d234b99
Commit 4d234b99 authored Aug 28, 2018 by Aaryaman Sagar Committed by Facebook Github Bot Aug 28, 2018
22 changed files
--- a/folly/MicroLock.cpp
+++ b/folly/MicroLock.cpp
@@ -47,7 +47,7 @@ retry:
          goto retry;
        }
      }
-      (void)wordPtr->futexWait(newWord, slotHeldBit);
+      detail::futexWait(wordPtr, newWord, slotHeldBit);
      needWaitBit = slotWaitBit;
    } else if (spins > maxSpins) {
      // sched_yield(), but more portable

--- a/folly/MicroLock.h
+++ b/folly/MicroLock.h
@@ -157,7 +157,7 @@ void MicroLockCore::unlock(unsigned slot) {
      oldWord, newWord, std::memory_order_release, std::memory_order_relaxed));

  if (oldWord & waitBit(slot)) {
-    (void)wordPtr->futexWake(1, heldBit(slot));
+    detail::futexWake(wordPtr, 1, heldBit(slot));
  }
 }


--- a/folly/SharedMutex.h
+++ b/folly/SharedMutex.h
@@ -513,7 +513,7 @@ class SharedMutexImpl {
    }

    bool doWait(Futex& futex, uint32_t expected, uint32_t waitMask) {
-      futex.futexWait(expected, waitMask);
+      detail::futexWait(&futex, expected, waitMask);
      return true;
    }
  };
@@ -566,7 +566,8 @@ class SharedMutexImpl {
    }

    bool doWait(Futex& futex, uint32_t expected, uint32_t waitMask) {
-      auto result = futex.futexWaitUntil(expected, deadline(), waitMask);
+      auto result =
+          detail::futexWaitUntil(&futex, expected, deadline(), waitMask);
      return result != folly::detail::FutexResult::TIMEDOUT;
    }
  };
@@ -586,7 +587,8 @@ class SharedMutexImpl {
    }

    bool doWait(Futex& futex, uint32_t expected, uint32_t waitMask) {
-      auto result = futex.futexWaitUntil(expected, absDeadline_, waitMask);
+      auto result =
+          detail::futexWaitUntil(&futex, expected, absDeadline_, waitMask);
      return result != folly::detail::FutexResult::TIMEDOUT;
    }
  };
@@ -979,7 +981,8 @@ class SharedMutexImpl {
    // wakeup, we just disable the optimization in the case that there
    // are waiting U or S that we are eligible to wake.
    if ((wakeMask & kWaitingE) == kWaitingE &&
-        (state & wakeMask) == kWaitingE && state_.futexWake(1, kWaitingE) > 0) {
+        (state & wakeMask) == kWaitingE &&
+        detail::futexWake(&state_, 1, kWaitingE) > 0) {
      // somebody woke up, so leave state_ as is and clear it later
      return;
    }
@@ -994,7 +997,7 @@ class SharedMutexImpl {
  }

  void futexWakeAll(uint32_t wakeMask) {
-    state_.futexWake(std::numeric_limits<int>::max(), wakeMask);
+    detail::futexWake(&state_, std::numeric_limits<int>::max(), wakeMask);
  }

  DeferredReaderSlot* deferredReader(uint32_t slot) {

--- a/folly/concurrency/DynamicBoundedQueue.h
+++ b/folly/concurrency/DynamicBoundedQueue.h
@@ -601,7 +601,7 @@ class DynamicBoundedQueue {
      }
      if (MayBlock) {
        if (canBlock(weight, capacity)) {
-          waiting_.futexWaitUntil(WAITING, deadline);
+          detail::futexWaitUntil(&waiting_, WAITING, deadline);
        }
      } else {
        asm_volatile_pause();
@@ -645,7 +645,7 @@ class DynamicBoundedQueue {
    if (MayBlock) {
      std::atomic_thread_fence(std::memory_order_seq_cst);
      waiting_.store(NOTWAITING, std::memory_order_relaxed);
-      waiting_.futexWake();
+      detail::futexWake(&waiting_);
    }
  }


--- a/folly/detail/Futex-inl.h
+++ b/folly/detail/Futex-inl.h
+/*
+ * Copyright 2013-present Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include <folly/detail/Futex.h>
+
+namespace folly {
+namespace detail {
+
+/** Optimal when TargetClock is the same type as Clock.
+ *
+ *  Otherwise, both Clock::now() and TargetClock::now() must be invoked. */
+template <typename TargetClock, typename Clock, typename Duration>
+typename TargetClock::time_point time_point_conv(
+    std::chrono::time_point<Clock, Duration> const& time) {
+  using std::chrono::duration_cast;
+  using TimePoint = std::chrono::time_point<Clock, Duration>;
+  using TargetDuration = typename TargetClock::duration;
+  using TargetTimePoint = typename TargetClock::time_point;
+  if (time == TimePoint::max()) {
+    return TargetTimePoint::max();
+  } else if (std::is_same<Clock, TargetClock>::value) {
+    // in place of time_point_cast, which cannot compile without if-constexpr
+    auto const delta = time.time_since_epoch();
+    return TargetTimePoint(duration_cast<TargetDuration>(delta));
+  } else {
+    // different clocks with different epochs, so non-optimal case
+    auto const delta = time - Clock::now();
+    return TargetClock::now() + duration_cast<TargetDuration>(delta);
+  }
+}
+
+/**
+ * Available overloads, with definitions elsewhere
+ *
+ * These functions are treated as ADL-extension points, the templates above
+ * call these functions without them having being pre-declared.  This works
+ * because ADL lookup finds the definitions of these functions when you pass
+ * the relevant arguments
+ */
+int futexWakeImpl(Futex<std::atomic>* futex, int count, uint32_t wakeMask);
+FutexResult futexWaitImpl(
+    Futex<std::atomic>* futex,
+    uint32_t expected,
+    std::chrono::system_clock::time_point const* absSystemTime,
+    std::chrono::steady_clock::time_point const* absSteadyTime,
+    uint32_t waitMask);
+
+int futexWakeImpl(
+    Futex<EmulatedFutexAtomic>* futex,
+    int count,
+    uint32_t wakeMask);
+FutexResult futexWaitImpl(
+    Futex<EmulatedFutexAtomic>* futex,
+    uint32_t expected,
+    std::chrono::system_clock::time_point const* absSystemTime,
+    std::chrono::steady_clock::time_point const* absSteadyTime,
+    uint32_t waitMask);
+
+template <typename Futex, typename Deadline>
+typename std::enable_if<Deadline::clock::is_steady, FutexResult>::type
+futexWaitImpl(
+    Futex* futex,
+    uint32_t expected,
+    Deadline const& deadline,
+    uint32_t waitMask) {
+  return futexWaitImpl(futex, expected, nullptr, &deadline, waitMask);
+}
+
+template <typename Futex, typename Deadline>
+typename std::enable_if<!Deadline::clock::is_steady, FutexResult>::type
+futexWaitImpl(
+    Futex* futex,
+    uint32_t expected,
+    Deadline const& deadline,
+    uint32_t waitMask) {
+  return futexWaitImpl(futex, expected, &deadline, nullptr, waitMask);
+}
+
+template <typename Futex>
+FutexResult futexWait(Futex* futex, uint32_t expected, uint32_t waitMask) {
+  auto rv = futexWaitImpl(futex, expected, nullptr, nullptr, waitMask);
+  assert(rv != FutexResult::TIMEDOUT);
+  return rv;
+}
+
+template <typename Futex>
+int futexWake(Futex* futex, int count, uint32_t wakeMask) {
+  return futexWakeImpl(futex, count, wakeMask);
+}
+
+template <typename Futex, class Clock, class Duration>
+FutexResult futexWaitUntil(
+    Futex* futex,
+    uint32_t expected,
+    std::chrono::time_point<Clock, Duration> const& deadline,
+    uint32_t waitMask) {
+  using Target = typename std::conditional<
+      Clock::is_steady,
+      std::chrono::steady_clock,
+      std::chrono::system_clock>::type;
+  auto const converted = time_point_conv<Target>(deadline);
+  return converted == Target::time_point::max()
+      ? futexWaitImpl(futex, expected, nullptr, nullptr, waitMask)
+      : futexWaitImpl(futex, expected, converted, waitMask);
+}
+
+} // namespace detail
+} // namespace folly
--- a/folly/detail/Futex.cpp
+++ b/folly/detail/Futex.cpp
@@ -222,47 +222,46 @@ FutexResult emulatedFutexWaitImpl(
 } // namespace

 /////////////////////////////////
-// Futex<> specializations
+// Futex<> overloads

-template <>
-int
-Futex<std::atomic>::futexWake(int count, uint32_t wakeMask) {
+int futexWakeImpl(Futex<std::atomic>* futex, int count, uint32_t wakeMask) {
 #ifdef __linux__
-  return nativeFutexWake(this, count, wakeMask);
+  return nativeFutexWake(futex, count, wakeMask);
 #else
-  return emulatedFutexWake(this, count, wakeMask);
+  return emulatedFutexWake(futex, count, wakeMask);
 #endif
 }

-template <>
-int
-Futex<EmulatedFutexAtomic>::futexWake(int count, uint32_t wakeMask) {
-  return emulatedFutexWake(this, count, wakeMask);
+int futexWakeImpl(
+    Futex<EmulatedFutexAtomic>* futex,
+    int count,
+    uint32_t wakeMask) {
+  return emulatedFutexWake(futex, count, wakeMask);
 }

-template <>
-FutexResult Futex<std::atomic>::futexWaitImpl(
+FutexResult futexWaitImpl(
+    Futex<std::atomic>* futex,
    uint32_t expected,
    system_clock::time_point const* absSystemTime,
    steady_clock::time_point const* absSteadyTime,
    uint32_t waitMask) {
 #ifdef __linux__
  return nativeFutexWaitImpl(
-      this, expected, absSystemTime, absSteadyTime, waitMask);
+      futex, expected, absSystemTime, absSteadyTime, waitMask);
 #else
  return emulatedFutexWaitImpl(
-      this, expected, absSystemTime, absSteadyTime, waitMask);
+      futex, expected, absSystemTime, absSteadyTime, waitMask);
 #endif
 }

-template <>
-FutexResult Futex<EmulatedFutexAtomic>::futexWaitImpl(
+FutexResult futexWaitImpl(
+    Futex<EmulatedFutexAtomic>* futex,
    uint32_t expected,
    system_clock::time_point const* absSystemTime,
    steady_clock::time_point const* absSteadyTime,
    uint32_t waitMask) {
  return emulatedFutexWaitImpl(
-      this, expected, absSystemTime, absSteadyTime, waitMask);
+      futex, expected, absSystemTime, absSteadyTime, waitMask);
 }

 } // namespace detail

--- a/folly/detail/Futex.h
+++ b/folly/detail/Futex.h
@@ -19,12 +19,14 @@
 #include <atomic>
 #include <cassert>
 #include <chrono>
+#include <cstdint>
 #include <limits>
 #include <type_traits>

 #include <folly/portability/Unistd.h>

-namespace folly { namespace detail {
+namespace folly {
+namespace detail {

 enum class FutexResult {
  VALUE_CHANGED, /* futex value didn't match expected */
@@ -41,110 +43,55 @@ enum class FutexResult {
 * If you don't know how to use futex(), you probably shouldn't be using
 * this class.  Even if you do know how, you should have a good reason
 * (and benchmarks to back you up).
+ *
+ * Because of the semantics of the futex syscall, the futex family of
+ * functions are available as free functions rather than member functions
 */
 template <template <typename> class Atom = std::atomic>
-struct Futex : Atom<uint32_t> {
-  Futex() : Atom<uint32_t>() {}
-
-  explicit constexpr Futex(uint32_t init) : Atom<uint32_t>(init) {}
-
-  /** Puts the thread to sleep if this->load() == expected.  Returns true when
-   *  it is returning because it has consumed a wake() event, false for any
-   *  other return (signal, this->load() != expected, or spurious wakeup). */
-  FutexResult futexWait(uint32_t expected, uint32_t waitMask = -1) {
-    auto rv = futexWaitImpl(expected, nullptr, nullptr, waitMask);
-    assert(rv != FutexResult::TIMEDOUT);
-    return rv;
-  }
-
-  /** Similar to futexWait but also accepts a deadline until when the wait call
-   *  may block.
-   *
-   *  Optimal clock types: std::chrono::system_clock, std::chrono::steady_clock.
-   *  NOTE: On some systems steady_clock is just an alias for system_clock,
-   *  and is not actually steady.
-   *
-   *  For any other clock type, now() will be invoked twice. */
-  template <class Clock, class Duration = typename Clock::duration>
-  FutexResult futexWaitUntil(
-      uint32_t expected,
-      std::chrono::time_point<Clock, Duration> const& deadline,
-      uint32_t waitMask = -1) {
-    using Target = typename std::conditional<
-        Clock::is_steady,
-        std::chrono::steady_clock,
-        std::chrono::system_clock>::type;
-    auto const converted = time_point_conv<Target>(deadline);
-    return converted == Target::time_point::max()
-        ? futexWaitImpl(expected, nullptr, nullptr, waitMask)
-        : futexWaitImpl(expected, converted, waitMask);
-  }
-
-  /** Wakens up to count waiters where (waitMask & wakeMask) !=
-   *  0, returning the number of awoken threads, or -1 if an error
-   *  occurred.  Note that when constructing a concurrency primitive
-   *  that can guard its own destruction, it is likely that you will
-   *  want to ignore EINVAL here (as well as making sure that you
-   *  never touch the object after performing the memory store that
-   *  is the linearization point for unlock or control handoff).
-   *  See https://sourceware.org/bugzilla/show_bug.cgi?id=13690 */
-  int futexWake(int count = std::numeric_limits<int>::max(),
-                uint32_t wakeMask = -1);
-
- private:
-  /** Optimal when TargetClock is the same type as Clock.
-   *
-   *  Otherwise, both Clock::now() and TargetClock::now() must be invoked. */
-  template <typename TargetClock, typename Clock, typename Duration>
-  static typename TargetClock::time_point time_point_conv(
-      std::chrono::time_point<Clock, Duration> const& time) {
-    using std::chrono::duration_cast;
-    using TimePoint = std::chrono::time_point<Clock, Duration>;
-    using TargetDuration = typename TargetClock::duration;
-    using TargetTimePoint = typename TargetClock::time_point;
-    if (time == TimePoint::max()) {
-      return TargetTimePoint::max();
-    } else if (std::is_same<Clock, TargetClock>::value) {
-      // in place of time_point_cast, which cannot compile without if-constexpr
-      auto const delta = time.time_since_epoch();
-      return TargetTimePoint(duration_cast<TargetDuration>(delta));
-    } else {
-      // different clocks with different epochs, so non-optimal case
-      auto const delta = time - Clock::now();
-      return TargetClock::now() + duration_cast<TargetDuration>(delta);
-    }
-  }
+using Futex = Atom<std::uint32_t>;

-  template <typename Deadline>
-  typename std::enable_if<Deadline::clock::is_steady, FutexResult>::type
-  futexWaitImpl(
-      uint32_t expected,
-      Deadline const& deadline,
-      uint32_t waitMask) {
-    return futexWaitImpl(expected, nullptr, &deadline, waitMask);
-  }
+/**
+ * Puts the thread to sleep if this->load() == expected.  Returns true when
+ * it is returning because it has consumed a wake() event, false for any
+ * other return (signal, this->load() != expected, or spurious wakeup).
+ */
+template <typename Futex>
+FutexResult futexWait(Futex* futex, uint32_t expected, uint32_t waitMask = -1);

-  template <typename Deadline>
-  typename std::enable_if<!Deadline::clock::is_steady, FutexResult>::type
-  futexWaitImpl(
-      uint32_t expected,
-      Deadline const& deadline,
-      uint32_t waitMask) {
-    return futexWaitImpl(expected, &deadline, nullptr, waitMask);
-  }
+/**
+ * Similar to futexWait but also accepts a deadline until when the wait call
+ * may block.
+ *
+ * Optimal clock types: std::chrono::system_clock, std::chrono::steady_clock.
+ * NOTE: On some systems steady_clock is just an alias for system_clock,
+ * and is not actually steady.
+ *
+ * For any other clock type, now() will be invoked twice.
+ */
+template <
+    typename Futex,
+    class Clock,
+    class Duration = typename Clock::duration>
+FutexResult futexWaitUntil(
+    Futex* futex,
+    uint32_t expected,
+    std::chrono::time_point<Clock, Duration> const& deadline,
+    uint32_t waitMask = -1);

-  /** Underlying implementation of futexWait and futexWaitUntil.
-   *  At most one of absSystemTime and absSteadyTime should be non-null.
-   *  Timeouts are separated into separate parameters to allow the
-   *  implementations to be elsewhere without templating on the clock
-   *  type, which is otherwise complicated by the fact that steady_clock
-   *  is the same as system_clock on some platforms. */
-  FutexResult futexWaitImpl(
-      uint32_t expected,
-      std::chrono::system_clock::time_point const* absSystemTime,
-      std::chrono::steady_clock::time_point const* absSteadyTime,
-      uint32_t waitMask);
-};
+/**
+ * Wakes up to count waiters where (waitMask & wakeMask) != 0, returning the
+ * number of awoken threads, or -1 if an error occurred.  Note that when
+ * constructing a concurrency primitive that can guard its own destruction, it
+ * is likely that you will want to ignore EINVAL here (as well as making sure
+ * that you never touch the object after performing the memory store that is
+ * the linearization point for unlock or control handoff).  See
+ * https://sourceware.org/bugzilla/show_bug.cgi?id=13690
+ */
+template <typename Futex>
+int futexWake(
+    Futex* futex,
+    int count = std::numeric_limits<int>::max(),
+    uint32_t wakeMask = -1);

 /** A std::atomic subclass that can be used to force Futex to emulate
 *  the underlying futex() syscall.  This is primarily useful to test or
@@ -158,27 +105,7 @@ struct EmulatedFutexAtomic : public std::atomic<T> {
  EmulatedFutexAtomic(EmulatedFutexAtomic&& rhs) = delete;
 };

-/* Available specializations, with definitions elsewhere */
-
-template <>
-int Futex<std::atomic>::futexWake(int count, uint32_t wakeMask);
-
-template <>
-FutexResult Futex<std::atomic>::futexWaitImpl(
-    uint32_t expected,
-    std::chrono::system_clock::time_point const* absSystemTime,
-    std::chrono::steady_clock::time_point const* absSteadyTime,
-    uint32_t waitMask);
-
-template <>
-int Futex<EmulatedFutexAtomic>::futexWake(int count, uint32_t wakeMask);
-
-template <>
-FutexResult Futex<EmulatedFutexAtomic>::futexWaitImpl(
-    uint32_t expected,
-    std::chrono::system_clock::time_point const* absSystemTime,
-    std::chrono::steady_clock::time_point const* absSteadyTime,
-    uint32_t waitMask);
-
 } // namespace detail
 } // namespace folly
+
+#include <folly/detail/Futex-inl.h>
--- a/folly/detail/MemoryIdler.h
+++ b/folly/detail/MemoryIdler.h
@@ -99,10 +99,10 @@ struct MemoryIdler {
  /// system with bursty requests. The default is to wait up to 50%
  /// extra, so on average 25% extra.
  template <
-      template <typename> class Atom,
+      typename Futex,
      typename IdleTime = std::chrono::steady_clock::duration>
  static FutexResult futexWait(
-      Futex<Atom>& fut,
+      Futex& fut,
      uint32_t expected,
      uint32_t waitMask = -1,
      IdleTime const& idleTimeout =
@@ -121,7 +121,9 @@ struct MemoryIdler {
            timeoutVariationFrac)) {
      return pre;
    }
-    return fut.futexWait(expected, waitMask);
+
+    using folly::detail::futexWait;
+    return futexWait(&fut, expected, waitMask);
  }

  /// Equivalent to fut.futexWaitUntil(expected, deadline, waitMask), but
@@ -133,11 +135,11 @@ struct MemoryIdler {
  /// system with bursty requests. The default is to wait up to 50%
  /// extra, so on average 25% extra.
  template <
-      template <typename> class Atom,
+      typename Futex,
      typename Deadline,
      typename IdleTime = std::chrono::steady_clock::duration>
  static FutexResult futexWaitUntil(
-      Futex<Atom>& fut,
+      Futex& fut,
      uint32_t expected,
      Deadline const& deadline,
      uint32_t waitMask = -1,
@@ -157,17 +159,16 @@ struct MemoryIdler {
            timeoutVariationFrac)) {
      return pre;
    }
-    return fut.futexWaitUntil(expected, deadline, waitMask);
+
+    using folly::detail::futexWaitUntil;
+    return futexWaitUntil(&fut, expected, deadline, waitMask);
  }

 private:
-  template <
-      template <typename> class Atom,
-      typename Deadline,
-      typename IdleTime>
+  template <typename Futex, typename Deadline, typename IdleTime>
  static bool futexWaitPreIdle(
      FutexResult& _ret,
-      Futex<Atom>& fut,
+      Futex& fut,
      uint32_t expected,
      Deadline const& deadline,
      uint32_t waitMask,
@@ -189,7 +190,8 @@ struct MemoryIdler {
    if (idleTimeout > IdleTime::zero()) {
      auto idleDeadline = Deadline::clock::now() + idleTimeout;
      if (idleDeadline < deadline) {
-        auto rv = fut.futexWaitUntil(expected, idleDeadline, waitMask);
+        using folly::detail::futexWaitUntil;
+        auto rv = futexWaitUntil(&fut, expected, idleDeadline, waitMask);
        if (rv != FutexResult::TIMEDOUT) {
          // finished before timeout hit, no flush
          _ret = rv;

--- a/folly/detail/TurnSequencer.h
+++ b/folly/detail/TurnSequencer.h
@@ -153,13 +153,13 @@ struct TurnSequencer {
        }
      }
      if (absTime) {
-        auto futexResult =
-            state_.futexWaitUntil(new_state, *absTime, futexChannel(turn));
+        auto futexResult = detail::futexWaitUntil(
+            &state_, new_state, *absTime, futexChannel(turn));
        if (futexResult == FutexResult::TIMEDOUT) {
          return TryWaitResult::TIMEDOUT;
        }
      } else {
-        state_.futexWait(new_state, futexChannel(turn));
+        detail::futexWait(&state_, new_state, futexChannel(turn));
      }
    }

@@ -202,8 +202,8 @@ struct TurnSequencer {
                 max_waiter_delta == 0 ? 0 : max_waiter_delta - 1);
      if (state_.compare_exchange_strong(state, new_state)) {
        if (max_waiter_delta != 0) {
-          state_.futexWake(std::numeric_limits<int>::max(),
-                           futexChannel(turn + 1));
+          detail::futexWake(
+              &state_, std::numeric_limits<int>::max(), futexChannel(turn + 1));
        }
        break;
      }

--- a/folly/experimental/EventCount.h
+++ b/folly/experimental/EventCount.h
@@ -142,8 +142,8 @@ inline void EventCount::notifyAll() noexcept {
 inline void EventCount::doNotify(int n) noexcept {
  uint64_t prev = val_.fetch_add(kAddEpoch, std::memory_order_acq_rel);
  if (UNLIKELY(prev & kWaiterMask)) {
-    (reinterpret_cast<detail::Futex<std::atomic>*>(&val_) + kEpochOffset)
-        ->futexWake(n);
+    detail::futexWake(
+        reinterpret_cast<detail::Futex<std::atomic>*>(&val_) + kEpochOffset, n);
  }
 }

@@ -162,8 +162,9 @@ inline void EventCount::cancelWait() noexcept {

 inline void EventCount::wait(Key key) noexcept {
  while ((val_.load(std::memory_order_acquire) >> kEpochShift) == key.epoch_) {
-    (reinterpret_cast<detail::Futex<std::atomic>*>(&val_) + kEpochOffset)
-        ->futexWait(key.epoch_);
+    detail::futexWait(
+        reinterpret_cast<detail::Futex<std::atomic>*>(&val_) + kEpochOffset,
+        key.epoch_);
  }
  // memory_order_relaxed would suffice for correctness, but the faster
  // #waiters gets to 0, the less likely it is that we'll do spurious wakeups

--- a/folly/experimental/FlatCombiningPriorityQueue.h
+++ b/folly/experimental/FlatCombiningPriorityQueue.h
@@ -320,7 +320,7 @@ FlatCombiningPriorityQueue<T, PriorityQueue, Mutex, Atom>::try_push_impl(

    if (res) {
      if (wake) {
-        empty_.futexWake();
+        detail::futexWake(&empty_);
      }
      return true;
    }
@@ -329,12 +329,12 @@ FlatCombiningPriorityQueue<T, PriorityQueue, Mutex, Atom>::try_push_impl(
    }
    while (isTrue(full_)) {
      if (when == std::chrono::time_point<Clock>::max()) {
-        full_.futexWait(1);
+        detail::futexWait(&full_, 1);
      } else {
        if (Clock::now() > when) {
          return false;
        } else {
-          full_.futexWaitUntil(1, when);
+          detail::futexWaitUntil(&full_, 1, when);
        }
      }
    } // inner while loop
@@ -369,18 +369,18 @@ FlatCombiningPriorityQueue<T, PriorityQueue, Mutex, Atom>::try_pop_impl(

    if (res) {
      if (wake) {
-        full_.futexWake();
+        detail::futexWake(&full_);
      }
      return true;
    }
    while (isTrue(empty_)) {
      if (when == std::chrono::time_point<Clock>::max()) {
-        empty_.futexWait(1);
+        detail::futexWait(&empty_, 1);
      } else {
        if (Clock::now() > when) {
          return false;
        } else {
-          empty_.futexWaitUntil(1, when);
+          detail::futexWaitUntil(&empty_, 1, when);
        }
      }
    } // inner while loop
@@ -415,12 +415,12 @@ FlatCombiningPriorityQueue<T, PriorityQueue, Mutex, Atom>::try_peek_impl(
    }
    while (isTrue(empty_)) {
      if (when == std::chrono::time_point<Clock>::max()) {
-        empty_.futexWait(1);
+        detail::futexWait(&empty_, 1);
      } else {
        if (Clock::now() > when) {
          return false;
        } else {
-          empty_.futexWaitUntil(1, when);
+          detail::futexWaitUntil(&empty_, 1, when);
        }
      }
    } // inner while loop

--- a/folly/experimental/RelaxedConcurrentPriorityQueue.h
+++ b/folly/experimental/RelaxedConcurrentPriorityQueue.h
@@ -1042,7 +1042,7 @@ class RelaxedConcurrentPriorityQueue {
      if (futex_array_[loc].compare_exchange_strong(curfutex, ready)) {
        if (curfutex &
            1) { // One or more consumers may be blocked on this futex
-          futex_array_[loc].futexWake();
+          detail::futexWake(&futex_array_[loc]);
        }
        return;
      } else {
@@ -1095,7 +1095,7 @@ class RelaxedConcurrentPriorityQueue {
    auto curfutex = futex_array_[loc].load(std::memory_order_acquire);
    if (curfutex &
        1) { /// The last round consumers are still waiting, go to sleep
-      futex_array_[loc].futexWait(curfutex);
+      detail::futexWait(&futex_array_[loc], curfutex);
    }
    if (trySpinBeforeBlock(
            curticket,
@@ -1106,12 +1106,12 @@ class RelaxedConcurrentPriorityQueue {
      curfutex = futex_array_[loc].load(std::memory_order_acquire);
      if (curfutex &
          1) { /// The last round consumers are still waiting, go to sleep
-        futex_array_[loc].futexWait(curfutex);
+        detail::futexWait(&futex_array_[loc], curfutex);
      } else if (!futexIsReady(curticket)) { // current ticket < pop ticket
        uint32_t blocking_futex = curfutex + 1;
        if (futex_array_[loc].compare_exchange_strong(
                curfutex, blocking_futex)) {
-          futex_array_[loc].futexWait(blocking_futex);
+          detail::futexWait(&futex_array_[loc], blocking_futex);
        }
      } else {
        return;

--- a/folly/fibers/Baton.cpp
+++ b/folly/fibers/Baton.cpp
@@ -24,6 +24,9 @@
 namespace folly {
 namespace fibers {

+using folly::detail::futexWaitUntil;
+using folly::detail::futexWake;
+
 void Baton::setWaiter(Waiter& waiter) {
  auto curr_waiter = waiter_.load();
  do {
@@ -120,8 +123,9 @@ bool Baton::timedWaitThread(TimeoutController::Duration timeout) {
          waiter_.compare_exchange_strong(waiter, THREAD_WAITING))) {
    auto deadline = TimeoutController::Clock::now() + timeout;
    do {
+      auto* futex = &futex_.futex;
      const auto wait_rv =
-          futex_.futex.futexWaitUntil(uint32_t(THREAD_WAITING), deadline);
+          futexWaitUntil(futex, uint32_t(THREAD_WAITING), deadline);
      if (wait_rv == folly::detail::FutexResult::TIMEDOUT) {
        return false;
      }
@@ -174,11 +178,11 @@ bool Baton::try_wait() {
 void Baton::postThread() {
  auto expected = THREAD_WAITING;

+  auto* futex = &futex_.futex;
  if (!waiter_.compare_exchange_strong(expected, POSTED)) {
    return;
  }
-
-  futex_.futex.futexWake(1);
+  futexWake(futex, 1);
 }

 void Baton::reset() {

--- a/folly/stats/test/QuantileEstimatorTest.cpp
+++ b/folly/stats/test/QuantileEstimatorTest.cpp
@@ -24,6 +24,7 @@ struct MockClock {
 public:
  using duration = std::chrono::steady_clock::duration;
  using time_point = std::chrono::steady_clock::time_point;
+  static constexpr auto is_steady = true;

  static time_point now() {
    return Now;

--- a/folly/synchronization/Baton.h
+++ b/folly/synchronization/Baton.h
@@ -154,7 +154,7 @@ class Baton {

    assert(before == WAITING);
    state_.store(LATE_DELIVERY, std::memory_order_release);
-    state_.futexWake(1);
+    detail::futexWake(&state_, 1);
  }

  /// Waits until post() has been called in the current Baton lifetime.

--- a/folly/synchronization/SaturatingSemaphore.h
+++ b/folly/synchronization/SaturatingSemaphore.h
@@ -268,7 +268,7 @@ FOLLY_NOINLINE void SaturatingSemaphore<MayBlock, Atom>::postSlowWaiterMayBlock(
            READY,
            std::memory_order_release,
            std::memory_order_relaxed)) {
-      state_.futexWake();
+      detail::futexWake(&state_);
      return;
    }
  }

--- a/folly/synchronization/detail/ThreadCachedInts.h
+++ b/folly/synchronization/detail/ThreadCachedInts.h
@@ -60,7 +60,7 @@ class ThreadCachedInts {
      ints_->orphan_dec_[1].fetch_add(
          dec_[1].load(std::memory_order_relaxed), std::memory_order_relaxed);
      ints_->waiting_.store(0, std::memory_order_release);
-      ints_->waiting_.futexWake();
+      detail::futexWake(&ints_->waiting_);
      // reset the cache_ on destructor so we can handle the delete/recreate
      cache_ = nullptr;
    }
@@ -102,7 +102,7 @@ class ThreadCachedInts {
    folly::asymmetricLightBarrier(); // C
    if (waiting_.load(std::memory_order_acquire)) {
      waiting_.store(0, std::memory_order_release);
-      waiting_.futexWake();
+      detail::futexWake(&waiting_);
    }
  }

@@ -150,7 +150,7 @@ class ThreadCachedInts {
      if (readFull(phase) == 0) {
        break;
      }
-      waiting_.futexWait(1);
+      detail::futexWait(&waiting_, 1);
    }
    waiting_.store(0, std::memory_order_relaxed);
  }

--- a/folly/synchronization/test/ParkingLotBenchmark.cpp
+++ b/folly/synchronization/test/ParkingLotBenchmark.cpp
@@ -39,7 +39,7 @@ BENCHMARK(FutexNoWaitersWake, iters) {
    t = std::thread([&]() {
      b.wait();
      for (auto i = 0u; i < iters; i++) {
-        fu.futexWake(1);
+        detail::futexWake(&fu, 1);
      }
    });
  }
@@ -82,7 +82,7 @@ BENCHMARK(FutexWakeOne, iters) {
    t = std::thread([&]() {
      b.wait();
      while (true) {
-        fu.futexWait(0);
+        detail::futexWait(&fu, 0);
        if (fu.load(std::memory_order_relaxed)) {
          return;
        }
@@ -92,10 +92,10 @@ BENCHMARK(FutexWakeOne, iters) {
  susp.dismiss();
  b.wait();
  for (auto i = 0u; i < iters; i++) {
-    fu.futexWake(1);
+    detail::futexWake(&fu, 1);
  }
  fu.store(1);
-  fu.futexWake(threads.size());
+  detail::futexWake(&fu, threads.size());

  for (auto& t : threads) {
    t.join();
@@ -148,7 +148,7 @@ BENCHMARK(FutexWakeAll, iters) {
    t = std::thread([&]() {
      b.wait();
      while (true) {
-        fu.futexWait(0);
+        detail::futexWait(&fu, 0);
        if (done.load(std::memory_order_relaxed)) {
          return;
        }
@@ -158,11 +158,11 @@ BENCHMARK(FutexWakeAll, iters) {
  susp.dismiss();
  b.wait();
  for (auto i = 0u; i < iters; i++) {
-    fu.futexWake(threads.size());
+    detail::futexWake(&fu, threads.size());
  }
  fu.store(1);
  done = true;
-  fu.futexWake(threads.size());
+  detail::futexWake(&fu, threads.size());

  for (auto& t : threads) {
    t.join();

--- a/folly/test/DeterministicSchedule.cpp
+++ b/folly/test/DeterministicSchedule.cpp
@@ -301,32 +301,28 @@ void DeterministicSchedule::wait(sem_t* sem) {
    // we're not busy waiting because this is a deterministic schedule
  }
 }
-} // namespace test
-} // namespace folly
-
-namespace folly {
-namespace detail {

-using namespace test;
-using namespace std::chrono;
-
-template <>
-FutexResult Futex<DeterministicAtomic>::futexWaitImpl(
+detail::FutexResult futexWaitImpl(
+    detail::Futex<DeterministicAtomic>* futex,
    uint32_t expected,
-    system_clock::time_point const* absSystemTimeout,
-    steady_clock::time_point const* absSteadyTimeout,
+    std::chrono::system_clock::time_point const* absSystemTimeout,
+    std::chrono::steady_clock::time_point const* absSteadyTimeout,
    uint32_t waitMask) {
+  using namespace test;
+  using namespace std::chrono;
+  using namespace folly::detail;
+
  bool hasTimeout = absSystemTimeout != nullptr || absSteadyTimeout != nullptr;
  bool awoken = false;
  FutexResult result = FutexResult::AWOKEN;

  DeterministicSchedule::beforeSharedAccess();
-  FOLLY_TEST_DSCHED_VLOG(this << ".futexWait(" << std::hex << expected
-                              << ", .., " << std::hex << waitMask
-                              << ") beginning..");
+  FOLLY_TEST_DSCHED_VLOG(
+      "futexWait(" << futex << ", " << std::hex << expected << ", .., "
+                   << std::hex << waitMask << ") beginning..");
  futexLock.lock();
-  if (this->data == expected) {
-    auto& queue = futexQueues[this];
+  if (futex->data == expected) {
+    auto& queue = futexQueues[futex];
    queue.emplace_back(waitMask, &awoken);
    auto ours = queue.end();
    ours--;
@@ -340,10 +336,10 @@ FutexResult Futex<DeterministicAtomic>::futexWaitImpl(
      // a 10% probability if we haven't been woken up already
      if (!awoken && hasTimeout &&
          DeterministicSchedule::getRandNumber(100) < 10) {
-        assert(futexQueues.count(this) != 0 && &futexQueues[this] == &queue);
+        assert(futexQueues.count(futex) != 0 && &futexQueues[futex] == &queue);
        queue.erase(ours);
        if (queue.empty()) {
-          futexQueues.erase(this);
+          futexQueues.erase(futex);
        }
        // Simulate ETIMEDOUT 90% of the time and other failures
        // remaining time
@@ -373,20 +369,25 @@ FutexResult Futex<DeterministicAtomic>::futexWaitImpl(
      resultStr = "VALUE_CHANGED";
      break;
  }
-  FOLLY_TEST_DSCHED_VLOG(this << ".futexWait(" << std::hex << expected
-                              << ", .., " << std::hex << waitMask << ") -> "
-                              << resultStr);
+  FOLLY_TEST_DSCHED_VLOG(
+      "futexWait(" << futex << ", " << std::hex << expected << ", .., "
+                   << std::hex << waitMask << ") -> " << resultStr);
  DeterministicSchedule::afterSharedAccess();
  return result;
 }

-template <>
-int Futex<DeterministicAtomic>::futexWake(int count, uint32_t wakeMask) {
+int futexWakeImpl(
+    detail::Futex<test::DeterministicAtomic>* futex,
+    int count,
+    uint32_t wakeMask) {
+  using namespace test;
+  using namespace std::chrono;
+
  int rv = 0;
  DeterministicSchedule::beforeSharedAccess();
  futexLock.lock();
-  if (futexQueues.count(this) > 0) {
-    auto& queue = futexQueues[this];
+  if (futexQueues.count(futex) > 0) {
+    auto& queue = futexQueues[futex];
    auto iter = queue.begin();
    while (iter != queue.end() && rv < count) {
      auto cur = iter++;
@@ -397,16 +398,21 @@ int Futex<DeterministicAtomic>::futexWake(int count, uint32_t wakeMask) {
      }
    }
    if (queue.empty()) {
-      futexQueues.erase(this);
+      futexQueues.erase(futex);
    }
  }
  futexLock.unlock();
-  FOLLY_TEST_DSCHED_VLOG(this << ".futexWake(" << count << ", " << std::hex
-                              << wakeMask << ") -> " << rv);
+  FOLLY_TEST_DSCHED_VLOG(
+      "futexWake(" << futex << ", " << count << ", " << std::hex << wakeMask
+                   << ") -> " << rv);
  DeterministicSchedule::afterSharedAccess();
  return rv;
 }
-} // namespace detail
+
+} // namespace test
+} // namespace folly
+
+namespace folly {

 template <>
 CacheLocality const& CacheLocality::system<test::DeterministicAtomic>() {
@@ -416,6 +422,6 @@ CacheLocality const& CacheLocality::system<test::DeterministicAtomic>() {

 template <>
 Getcpu::Func AccessSpreader<test::DeterministicAtomic>::pickGetcpuFunc() {
-  return &detail::DeterministicSchedule::getcpu;
+  return &test::DeterministicSchedule::getcpu;
 }
 } // namespace folly
--- a/folly/test/DeterministicSchedule.h
+++ b/folly/test/DeterministicSchedule.h
@@ -455,6 +455,18 @@ struct DeterministicAtomic {
  }
 };

+/* Futex extensions for DeterministicSchedule based Futexes */
+int futexWakeImpl(
+    detail::Futex<test::DeterministicAtomic>* futex,
+    int count,
+    uint32_t wakeMask);
+detail::FutexResult futexWaitImpl(
+    detail::Futex<test::DeterministicAtomic>* futex,
+    uint32_t expected,
+    std::chrono::system_clock::time_point const* absSystemTime,
+    std::chrono::steady_clock::time_point const* absSteadyTime,
+    uint32_t waitMask);
+
 /**
 * DeterministicMutex is a drop-in replacement of std::mutex that
 * cooperates with DeterministicSchedule.
@@ -504,23 +516,6 @@ struct DeterministicMutex {
  }
 };
 } // namespace test
-} // namespace folly
-
-/* Specialization declarations */
-
-namespace folly {
-namespace detail {
-
-template <>
-int Futex<test::DeterministicAtomic>::futexWake(int count, uint32_t wakeMask);
-
-template <>
-FutexResult Futex<test::DeterministicAtomic>::futexWaitImpl(
-    uint32_t expected,
-    std::chrono::system_clock::time_point const* absSystemTime,
-    std::chrono::steady_clock::time_point const* absSteadyTime,
-    uint32_t waitMask);
-} // namespace detail

 template <>
 Getcpu::Func AccessSpreader<test::DeterministicAtomic>::pickGetcpuFunc();

--- a/folly/test/FutexTest.cpp
+++ b/folly/test/FutexTest.cpp
@@ -39,19 +39,19 @@ typedef DeterministicSchedule DSched;
 template <template <typename> class Atom>
 void run_basic_thread(
    Futex<Atom>& f) {
-  EXPECT_EQ(FutexResult::AWOKEN, f.futexWait(0));
+  EXPECT_EQ(FutexResult::AWOKEN, futexWait(&f, 0));
 }

 template <template <typename> class Atom>
 void run_basic_tests() {
  Futex<Atom> f(0);

-  EXPECT_EQ(FutexResult::VALUE_CHANGED, f.futexWait(1));
-  EXPECT_EQ(f.futexWake(), 0);
+  EXPECT_EQ(FutexResult::VALUE_CHANGED, futexWait(&f, 1));
+  EXPECT_EQ(futexWake(&f), 0);

  auto thr = DSched::thread(std::bind(run_basic_thread<Atom>, std::ref(f)));

-  while (f.futexWake() != 1) {
+  while (futexWake(&f) != 1) {
    std::this_thread::yield();
  }

@@ -68,7 +68,7 @@ void liveClockWaitUntilTests() {
      while (true) {
        const auto deadline = time_point_cast<Duration>(
            Clock::now() + microseconds(1 << (stress % 20)));
-        const auto res = fp->futexWaitUntil(0, deadline);
+        const auto res = futexWaitUntil(fp, 0, deadline);
        EXPECT_TRUE(res == FutexResult::TIMEDOUT || res == FutexResult::AWOKEN);
        if (res == FutexResult::AWOKEN) {
          break;
@@ -76,7 +76,7 @@ void liveClockWaitUntilTests() {
      }
    });

-    while (f.futexWake() != 1) {
+    while (futexWake(&f) != 1) {
      std::this_thread::yield();
    }

@@ -86,7 +86,7 @@ void liveClockWaitUntilTests() {
  {
    const auto start = Clock::now();
    const auto deadline = time_point_cast<Duration>(start + milliseconds(100));
-    EXPECT_EQ(f.futexWaitUntil(0, deadline), FutexResult::TIMEDOUT);
+    EXPECT_EQ(futexWaitUntil(&f, 0, deadline), FutexResult::TIMEDOUT);
    LOG(INFO) << "Futex wait timed out after waiting for "
              << duration_cast<milliseconds>(Clock::now() - start).count()
              << "ms using clock with " << Duration::period::den
@@ -97,7 +97,7 @@ void liveClockWaitUntilTests() {
    const auto start = Clock::now();
    const auto deadline = time_point_cast<Duration>(
        start - 2 * start.time_since_epoch());
-    EXPECT_EQ(f.futexWaitUntil(0, deadline), FutexResult::TIMEDOUT);
+    EXPECT_EQ(futexWaitUntil(&f, 0, deadline), FutexResult::TIMEDOUT);
    LOG(INFO) << "Futex wait with invalid deadline timed out after waiting for "
              << duration_cast<milliseconds>(Clock::now() - start).count()
              << "ms using clock with " << Duration::period::den
@@ -111,7 +111,7 @@ void deterministicAtomicWaitUntilTests() {

  // Futex wait must eventually fail with either FutexResult::TIMEDOUT or
  // FutexResult::INTERRUPTED
-  const auto res = f.futexWaitUntil(0, Clock::now() + milliseconds(100));
+  const auto res = futexWaitUntil(&f, 0, Clock::now() + milliseconds(100));
  EXPECT_TRUE(res == FutexResult::TIMEDOUT || res == FutexResult::INTERRUPTED);
 }

@@ -192,10 +192,10 @@ void run_wake_blocked_test() {
    bool success = false;
    Futex<Atom> f(0);
    auto thr = DSched::thread(
-        [&] { success = FutexResult::AWOKEN == f.futexWait(0); });
+        [&] { success = FutexResult::AWOKEN == futexWait(&f, 0); });
    /* sleep override */ std::this_thread::sleep_for(delay);
    f.store(1);
-    f.futexWake(1);
+    futexWake(&f, 1);
    DSched::join(thr);
    LOG(INFO) << "delay=" << delay.count() << "_ms, success=" << success;
    if (success) {

--- a/folly/test/MemoryIdlerTest.cpp
+++ b/folly/test/MemoryIdlerTest.cpp
@@ -49,16 +49,6 @@ TEST(MemoryIdler, releaseMallocTLS) {
  delete[] p;
 }

-
-/// MockedAtom gives us a way to select a mocked Futex implementation
-/// inside Baton, even though the atom itself isn't exercised by the
-/// mocked futex
-template <typename T>
-struct MockAtom : public std::atomic<T> {
-  explicit MockAtom(T init = 0) : std::atomic<T>(init) {}
-};
-
-
 /// MockClock is a bit tricky because we are mocking a static function
 /// (now()), so we need to find the corresponding mock instance without
 /// extending its scope beyond that of the test.  I generally avoid
@@ -84,29 +74,39 @@ struct MockClock {
 };

 std::weak_ptr<StrictMock<MockClock>> MockClock::s_mockClockInstance;
+static auto const forever = MockClock::time_point::max();

+/// MockedAtom gives us a way to select a mocked Futex implementation
+/// inside Baton, even though the atom itself isn't exercised by the
+/// mocked futex
+///
+/// Futex<MockAtom> is our mocked futex implementation.  Note that the method
+/// signatures differ from the real Futex because we have elided unused default
+/// params and collapsed templated methods into the used type
+template <typename T>
+struct MockAtom : public std::atomic<T> {
+  explicit MockAtom(T init = 0) : std::atomic<T>(init) {}

-
-namespace folly { namespace detail {
-
-/// Futex<MockAtom> is our mocked futex implementation.  Note that the
-/// method signatures differ from the real Futex because we have elided
-/// unused default params and collapsed templated methods into the
-/// used type
-template <>
-struct Futex<MockAtom> {
  MOCK_METHOD2(futexWait, FutexResult(uint32_t, uint32_t));
  MOCK_METHOD3(futexWaitUntil,
               FutexResult(uint32_t, const MockClock::time_point&, uint32_t));
 };

-} // namespace detail
-} // namespace folly
-
-static auto const forever = MockClock::time_point::max();
+FutexResult
+futexWait(Futex<MockAtom>* futex, uint32_t expected, uint32_t waitMask) {
+  return futex->futexWait(expected, waitMask);
+}
+template <typename Clock, typename Duration>
+FutexResult futexWaitUntil(
+    Futex<MockAtom>* futex,
+    std::uint32_t expected,
+    std::chrono::time_point<Clock, Duration> const& deadline,
+    uint32_t waitMask) {
+  return futex->futexWaitUntil(expected, deadline, waitMask);
+}

 TEST(MemoryIdler, futexWaitValueChangedEarly) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();
  auto begin = MockClock::time_point(std::chrono::seconds(100));
  auto idleTimeout = MemoryIdler::defaultIdleTimeout.load();
@@ -121,7 +121,7 @@ TEST(MemoryIdler, futexWaitValueChangedEarly) {
 }

 TEST(MemoryIdler, futexWaitValueChangedLate) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();
  auto begin = MockClock::time_point(std::chrono::seconds(100));
  auto idleTimeout = MemoryIdler::defaultIdleTimeout.load();
@@ -138,7 +138,7 @@ TEST(MemoryIdler, futexWaitValueChangedLate) {
 }

 TEST(MemoryIdler, futexWaitAwokenEarly) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();
  auto begin = MockClock::time_point(std::chrono::seconds(100));
  auto idleTimeout = MemoryIdler::defaultIdleTimeout.load();
@@ -151,7 +151,7 @@ TEST(MemoryIdler, futexWaitAwokenEarly) {
 }

 TEST(MemoryIdler, futexWaitAwokenLate) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();
  auto begin = MockClock::time_point(std::chrono::seconds(100));
  auto idleTimeout = MemoryIdler::defaultIdleTimeout.load();
@@ -168,7 +168,7 @@ TEST(MemoryIdler, futexWaitAwokenLate) {
 }

 TEST(MemoryIdler, futexWaitImmediateFlush) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();

  EXPECT_CALL(fut, futexWaitUntil(2, forever, 0xff))
@@ -180,7 +180,7 @@ TEST(MemoryIdler, futexWaitImmediateFlush) {
 }

 TEST(MemoryIdler, futexWaitNeverFlush) {
-  StrictMock<Futex<MockAtom>> fut;
+  Futex<MockAtom> fut;
  auto clock = MockClock::setup();

  EXPECT_CALL(fut, futexWaitUntil(1, forever, -1))