Add a coinflip library.

Summary: The naive coinflip algorithm generates a random variable each time the function is called. We can avoid many of these calls to a random number generator by doing some tricky business with math stuff. See https://fb.workplace.com/groups/135724786501553/permalink/5767297086677600/ for some discussion on the topic. The intention behind this code is that we will use it in HHVM. That's why it provides the static methods that allow the caller to supply their own storage space for the counter and their own random number generator. Reviewed By: bmaurer Differential Revision: D30440295 fbshipit-source-id: ee0c3e49da3ec82a9fd6c8854b15ef107755ff39

Add a coinflip library.
Summary: The naive coinflip algorithm generates a random variable each time the function is called. We can avoid many of these calls to a random number generator by doing some tricky business with math stuff. See https://fb.workplace.com/groups/135724786501553/permalink/5767297086677600/ for some discussion on the topic. The intention behind this code is that we will use it in HHVM. That's why it provides the static methods that allow the caller to supply their own storage space for the counter and their own random number generator. Reviewed By: bmaurer Differential Revision: D30440295 fbshipit-source-id: ee0c3e49da3ec82a9fd6c8854b15ef107755ff39
d2643855 · Logan Evans · Facebook GitHub Bot · 47844d92 · d2643855 · d2643855
Commit d2643855 authored Sep 07, 2021 by Logan Evans Committed by Facebook GitHub Bot Sep 07, 2021
3 changed files
--- a/folly/experimental/Coinflip.h
+++ b/folly/experimental/Coinflip.h
+/*
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// @author: Logan Evans <lpe@fb.com>
+#pragma once
+#include <cfloat>
+#include <cmath>
+#include <folly/CPortability.h>
+#include <folly/ConstexprMath.h>
+#include <folly/Likely.h>
+#include <folly/Random.h>
+namespace folly {
+class Coinflip {
+ public:
+  // Set *counter = 0 to have it be initialized by this function. It is correct
+  // to reuse the same *counter after calling cointflip, even for unrelated
+  // callsites and different wait values.
+  template <class RNG = ThreadLocalPRNG>
+  inline static bool coinflip(
+      uint64_t* counter, uint32_t wait, RNG&& rng = ThreadLocalPRNG()) {
+    if (wait <= kMaxWaitNaive) {
+      // It's convention for coinflip(0) to return false and coinflip(1) to
+      // return true.
+      if (wait <= 1) {
+        return wait == 1;
+      }
+      return coinflip_naive(wait, std::forward<RNG>(rng));
+    }
+    // The value *counter is sampled from an exponential distribution, so we can
+    // use the memoryless property Pr(T > s + t | T > s) = Pr(T > t). In other
+    // words, if we subtract any amount s from *counter and *counter is still
+    // positive, then the expected wait time will be unchanged. If we resample
+    // *counter from that exponential distribution, the expected wait time will
+    // also be unchanged. What we are going to do is pick an amount to subtract
+    // from the counter that is quick to calculate. Once we detect a possible
+    // event using an over-approximation, we will perform a second calculation
+    // to check if we would have likewise produced an event using the precise
+    // step size.
+    //
+    // In order to facilitate a quick approximation, we will use
+    //   X ~ Exp(2^k)
+    // as our distribution. See kScaleBits for k.
+    //
+    // After every pass through this coinflip function, the value *counter is
+    // still a valid sample from Exp(2^k), which means that for any value wait
+    // passed into the function, we can produce a positive signal with the
+    // correct probability. That means that if one code location calls
+    // coinflip(1000) nine times, we can then use the same counter for a call
+    // to coinflip(100) in an unrelated code location.
+    //
+    // Since value *counter was sampled from an Exp(2^k) distribution, it
+    // represents the wait time until the next event. We can calculate the
+    // precise wait time by using the CDF for an exponential distribution,
+    //   CDF(step | scale=2^k) = 1 - e^(-step/(2^k)).
+    // We set the CDF to the expected probability of 1/wait and solve for step:
+    //   1/wait = 1 - e^(-step/(2^k))
+    //   step = -ln(1 - 1/wait) * 2^k.
+    //
+    // However, this is expensive to calculate on the fast path, so instead we
+    // find an approximation for the step size by using an approximation for
+    // the ln function. The derivative of ln(x) is 1/x. We can imagine placing
+    // a line tangent to the graph of ln(x) with a slope of 1/x, and using that
+    // line for our approximation. In order to avoid divisions later on, we
+    // will only use slopes where x is a power of 2. That gives us
+    //   ln(x) ~= mx+b = (1/2^c)x + b
+    // where c is some constant and b won't end up mattering.
+    //
+    // The value we are approximating is
+    //   ln(1 - 1/wait) = ln((wait-1)/wait) = ln(wait-1) - ln(wait).
+    //
+    // Since we are approximating ln(x) using a line, the result is equal to the
+    // inverse of the slope we are using for an estimate. That is,
+    //   ln(wait-1) - ln(wait) ~= ((1/2^c)(wait-1) + b) - ((1/2^c)(wait) + b)
+    //                          = -1/(2^c).
+    //
+    // This gives us
+    //   step  = -ln(1 - 1/wait) * 2^k
+    //   step ~= (2^k) / (2^c).
+    //
+    // As long as our step estimate is greater than the precise step, we will
+    // generate false positives. If instead the step estimate were less than the
+    // precise step, we would have false negatives, which we can't correct. To
+    // make sure our step estimate is greater than the precise step, we need to
+    // use the slope approximation for the value wait-1.
+    //
+    // That slope approximation is equal to the number of bits needed to encode
+    // the value, or the floor(lg(x)) function (integer log base 2). We then
+    // multiply by the scale of 2^k for the exponential distribution, which is
+    // some constant that depends on our choice of k plus the number of leading
+    // zeros needed to encode wait-1 as a 32 bit integer.
+    const uint64_t step_approx = kStepApproxBase << __builtin_clz(wait - 1);
+    if (FOLLY_UNLIKELY(*counter <= step_approx)) {
+      return possible_event(counter, wait, std::forward<RNG>(rng));
+    }
+    *counter -= step_approx;
+    return false;
+  }
+  template <class RNG = ThreadLocalPRNG>
+  explicit Coinflip(RNG&& rng = ThreadLocalPRNG())
+      : counter_(sample_new_counter(std::forward<RNG>(rng))) {}
+  template <class RNG = ThreadLocalPRNG>
+  bool coinflip(uint32_t wait, RNG&& rng = ThreadLocalPRNG()) {
+    return Coinflip::coinflip(&counter_, wait, std::forward<RNG>(rng));
+  }
+ private:
+  // Benchmarks show that the overhead of dealing with false positives is
+  // more expensive than the naive algorithm up through a wait of 8.
+  static constexpr int kMaxWaitNaive = 8;
+  // kScaleBits value needs to leave enough space in a 64-bit value for
+  // -log(DBL_EPSILON) * kScale to avoid an overflow while sampling a new
+  // counter. We use the inequality chain
+  //   abs(log(DBL_EPSILON)) < abs(log2(DBL_EPSILON)) <= DBL_MAX_EXP
+  // and then count the bits needed to encode DBL_MAX_EXP to decide on the
+  // number of bits.
+  static constexpr int kScaleBits =
+      64 - folly::constexpr_log2_ceil(DBL_MAX_EXP); // 54 on Skylake.
+  static constexpr uint64_t kScale = 1ul << kScaleBits;
+  static constexpr uint64_t kStepApproxBase = 1ul << (kScaleBits - 31);
+  // Since counter_ is an exponential random variable, it is memoryless. That
+  // means that if we subtract any value from counter_ and either confirm that
+  // counter_ is greater than 0 or else generate a new value for counter_, then
+  // the expected value of counter_ is the same at the beginning and at the end
+  // of that process. This allows us to use counter_ to support multiple wait
+  // times.
+  uint64_t counter_;
+  template <class RNG = ThreadLocalPRNG>
+  static uint64_t sample_new_counter(RNG&& rng = ThreadLocalPRNG()) {
+    // This uses the principle that the CDF of an exponential distribution is
+    //   f(x) = 1 - e^(-x/scale)
+    //   x = -scale * ln(1 - f(x)).
+    // When
+    //   f(x) ~ Uniform(0, 1)
+    // then
+    //   x ~ Exp(scale).
+    //
+    // The + 1 at the end of the expression is used to make sure that the new
+    // counter cannot be 0. This allows the code to detect that counter == 0
+    // is a user-initialized state and that this shouldn't automatically
+    // generate an event.
+    double U = Random::randDouble01(std::forward<RNG>(rng));
+    return -log(1 - U) * kScale + 1;
+  }
+  template <class RNG = ThreadLocalPRNG>
+  FOLLY_NOINLINE static bool coinflip_naive(uint32_t wait, RNG&& rng) {
+    return Random::rand32(wait, std::forward<RNG>(rng)) == 0;
+  }
+  template <class RNG = ThreadLocalPRNG>
+  FOLLY_NOINLINE static bool possible_event(
+      uint64_t* counter, uint32_t wait, RNG&& rng) {
+    const uint64_t old_counter = *counter;
+    *counter = sample_new_counter(std::forward<RNG>(rng));
+    if (old_counter == 0) {
+      // Since sample_new_counter will not produce a value of 0, this branch
+      // will only happen when *counter is manually initialized to 0.
+      return coinflip_naive(wait, std::forward<RNG>(rng));
+    }
+    const uint64_t step_precise = -log(1 - 1.0 / wait) * kScale;
+    // The false positive rate is close to 50% when wait is a power of 2 and
+    // is close to 0% when wait-1 is a power of 2.
+    return old_counter <= step_precise;
+  }
+};
+} // namespace folly
--- a/folly/experimental/test/CoinflipBenchmark.cpp
+++ b/folly/experimental/test/CoinflipBenchmark.cpp
+/*
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <folly/Benchmark.h>
+#include <folly/CPortability.h>
+#include <folly/Random.h>
+#include <folly/experimental/Coinflip.h>
+namespace folly {
+FOLLY_NOINLINE
+bool coinflip_naive(uint32_t wait) {
+  if (wait <= 1) {
+    return wait == 1;
+  }
+  return Random::rand32(wait) == 0;
+}
+FOLLY_NOINLINE bool coinflip_noinline(uint64_t* counter, uint32_t wait) {
+  return Coinflip::coinflip(
+      /*counter=*/counter,
+      /*wait=*/wait,
+      /*rng=*/folly::ThreadLocalPRNG());
+}
+#define BENCH_BOTH(WAIT)                                    \
+  BENCHMARK(coinflip_##WAIT##_naive, n) {                   \
+    while (n--) {                                           \
+      doNotOptimizeAway(coinflip_naive(WAIT));              \
+    }                                                       \
+  }                                                         \
+  BENCHMARK_RELATIVE(coinflip_##WAIT##_folly, n) {          \
+    uint64_t counter = 0;                                   \
+    while (n--) {                                           \
+      doNotOptimizeAway(coinflip_noinline(&counter, WAIT)); \
+    }                                                       \
+  }
+BENCH_BOTH(2)
+BENCH_BOTH(3)
+BENCH_BOTH(4)
+BENCH_BOTH(5)
+BENCH_BOTH(7)
+BENCH_BOTH(8)
+BENCH_BOTH(9)
+BENCH_BOTH(10)
+BENCH_BOTH(11)
+BENCH_BOTH(12)
+BENCH_BOTH(13)
+BENCH_BOTH(14)
+BENCH_BOTH(15)
+BENCH_BOTH(16)
+BENCH_BOTH(17)
+BENCH_BOTH(31)
+BENCH_BOTH(32)
+BENCH_BOTH(33)
+BENCH_BOTH(63)
+BENCH_BOTH(64)
+BENCH_BOTH(65)
+BENCH_BOTH(127)
+BENCH_BOTH(128)
+BENCH_BOTH(129)
+BENCH_BOTH(255)
+BENCH_BOTH(256)
+BENCH_BOTH(257)
+BENCH_BOTH(511)
+BENCH_BOTH(512)
+BENCH_BOTH(513)
+BENCH_BOTH(1023)
+BENCH_BOTH(1024)
+BENCH_BOTH(1025)
+BENCH_BOTH(2047)
+BENCH_BOTH(2048)
+BENCH_BOTH(2049)
+BENCH_BOTH(100)
+BENCH_BOTH(1000)
+BENCH_BOTH(10000)
+BENCH_BOTH(100000)
+BENCH_BOTH(1000000)
+BENCH_BOTH(10000000)
+} // namespace folly
+int main(int argc, char** argv) {
+  gflags::ParseCommandLineFlags(&argc, &argv, true);
+  folly::runBenchmarks();
+  return 0;
+}
--- a/folly/experimental/test/CoinflipTest.cpp
+++ b/folly/experimental/test/CoinflipTest.cpp
+/*
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <float.h>
+#include <math.h>
+#include <glog/logging.h>
+#include <folly/ConstexprMath.h>
+#include <folly/experimental/Coinflip.h>
+#include <folly/portability/GTest.h>
+// Source:
+// https://stackoverflow.com/questions/2328258/cumulative-normal-distribution-function-in-c-c/18786808#18786808
+double normalCDF(double value) {
+  return 0.5 * erfc(-value * M_SQRT1_2);
+}
+bool is_biased(
+    int64_t num_successes,
+    int64_t num_trials,
+    double p,
+    double threshold = 1e-6) {
+  if (p == 0 || p == 1) {
+    // This test won't work on the boundaries, so give a failure result to
+    // prompt debugging.
+    return true;
+  }
+  const double x = static_cast<double>(num_trials) / num_successes;
+  // mu is the mean of an exponential distribution with parameter p.
+  const double mu = 1 / p;
+  // sigma is the standard deviation of an exponential distribution with
+  // parameter p.
+  const double sigma = std::sqrt(1 - p) / p;
+  // The calculation for z comes from the central limit theorem. Specifically,
+  // when the underlying distribution has mean mu and standard deviation sigma,
+  // then sqrt(num_successes) * (x - mu) ~ Normal(0, sigma^2). The z-score is
+  // normalized to a Normal(0, 1) distribution.
+  const double z = std::sqrt(num_successes) * (x - mu) / sigma;
+  const double cdf = normalCDF(z);
+  if (cdf < threshold / 2 || cdf > 1 - threshold / 2) {
+    return true;
+  }
+  return false;
+}
+TEST(coinflip_test, ctor) {
+  folly::Coinflip coin0;
+}
+bool experiment(uint32_t wait, uint32_t num_successes = 1000000) {
+  const double p = 1.0 / wait;
+  int64_t num_trials = 0;
+  uint64_t counter = 0;
+  // We're going to cheat and look into the implementation a little so that we
+  // can gather enough values to perform a powerful statistical analysis. The
+  // idea is that if we know the step size, then we can calculate how many
+  // times we would need to call coinflip to possibly get the next true return.
+  // We will then jump ahead that many iterations.
+  uint64_t step = 0;
+  while (step == 0) {
+    uint64_t last_counter = counter;
+    if (!folly::Coinflip::coinflip(
+            /*counter=*/&counter,
+            /*wait=*/wait,
+            /*rng=*/folly::ThreadLocalPRNG())) {
+      if (last_counter != 0) {
+        step = last_counter - counter;
+      }
+    } else if (counter == 0) {
+      // The implementation doesn't appear to be doing anything with the
+      // counter.
+      return false;
+    }
+  }
+  for (uint32_t i = 0; i < num_successes; /* i updated in loop */) {
+    if (counter > step) {
+      // Avoid the leaving the counter at 0 since the implementation might
+      // or might not treat that case specially.
+      int64_t num_failures = counter / step - 1;
+      counter -= step * num_failures;
+      num_trials += num_failures;
+    }
+    for (uint32_t x = 0; x < 2; x++) {
+      // Do this twice because the first iteration might leave counter = 0,
+      // which might be handled differently in various implementations.
+      if (folly::Coinflip::coinflip(
+              /*counter=*/&counter,
+              /*wait=*/wait,
+              /*rng=*/folly::ThreadLocalPRNG())) {
+        i++;
+      }
+      num_trials++;
+    }
+  }
+  return is_biased(num_successes, num_trials, p);
+}
+#define TEST_BIAS(WAIT)                                                \
+  TEST(coinflip_test, bias_##WAIT) {                                   \
+    EXPECT_FALSE(experiment(/*wait=*/WAIT, /*num_successes=*/100000)); \
+  }
+TEST_BIAS(8)
+TEST_BIAS(9)
+TEST_BIAS(17)
+TEST_BIAS(31)
+TEST_BIAS(32)
+TEST_BIAS(33)
+TEST_BIAS(63)
+TEST_BIAS(64)
+TEST_BIAS(65)
+TEST_BIAS(127)
+TEST_BIAS(128)
+TEST_BIAS(129)
+TEST_BIAS(255)
+TEST_BIAS(256)
+TEST_BIAS(257)
+TEST_BIAS(511)
+TEST_BIAS(512)
+TEST_BIAS(513)
+TEST_BIAS(1023)
+TEST_BIAS(1024)
+TEST_BIAS(1025)
+TEST_BIAS(2047)
+TEST_BIAS(2048)
+TEST_BIAS(2049)
+TEST_BIAS(100)
+TEST_BIAS(1000)
+TEST_BIAS(10000)
+TEST_BIAS(100000)
+TEST_BIAS(1000000)
+TEST_BIAS(10000000)
+TEST_BIAS(UINT32_MAX)
+// Bias would creep in if the approximate step is smaller than the true step,
+// since that would lead to false negatives. We need to make sure that
+// the algorithm used to compute the approximate step is always greater than
+// or equal to the precise step.
+static constexpr int kScaleBits = 54;
+static constexpr uint64_t kScale = 1ul << kScaleBits;
+static constexpr uint64_t kStepApproxBase = 1ul << (kScaleBits - 31);
+uint64_t step_approx(uint32_t wait) {
+  return kStepApproxBase << __builtin_clz(wait - 1);
+}
+uint64_t step_exact(uint32_t wait) {
+  return -log(1 - 1.0 / wait) * kScale;
+}
+TEST(coinflip_test, approx_ge_exact) {
+  for (uint32_t power = 2; power < 32; power++) {
+    uint64_t wait = 1ul << power;
+    ASSERT_GE(step_approx(wait - 1), step_exact(wait - 1));
+    ASSERT_GE(step_approx(wait), step_exact(wait));
+    ASSERT_GE(step_approx(wait + 1), step_exact(wait + 1));
+  }
+  ASSERT_GE(step_approx(UINT32_MAX - 1), step_exact(UINT32_MAX - 1));
+  ASSERT_GE(step_approx(UINT32_MAX), step_exact(UINT32_MAX));
+}
+int main(int argc, char* argv[]) {
+  testing::InitGoogleTest(&argc, argv);
+  gflags::ParseCommandLineFlags(&argc, &argv, true);
+  return RUN_ALL_TESTS();
+}