serial/parallel benchmark

Summary: This is an attempt to see the effect of lock contention in the less-common case where a promise is fulfilled at the same time the future is still being set up. Although this is expected to be off the beaten path, it will still be a good idea to keep an eye on how we improve or worsen perf, when we play with different locking strategies (which I intend to do a lot of in the near term). Test Plan: ============================================================================ folly/wangle/test/Benchmark.cpp relative time/iter iters/s ============================================================================ constantFuture 249.31ns 4.01M promiseAndFuture 100.88% 247.13ns 4.05M withThen 43.87% 568.30ns 1.76M ---------------------------------------------------------------------------- oneThen 569.62ns 1.76M twoThens 63.46% 897.60ns 1.11M fourThens 39.64% 1.44us 695.90K hundredThens 1.91% 29.75us 33.61K ---------------------------------------------------------------------------- serial 4.97ms 201.21 parallel 60.22% 8.25ms 121.18 ============================================================================ Reviewed By: jsedgwick@fb.com Subscribers: meisner, net-systems@, fugalh, exa, njormrod FB internal diff: D1593010

serial/parallel benchmark
Summary: This is an attempt to see the effect of lock contention in the less-common case where a promise is fulfilled at the same time the future is still being set up. Although this is expected to be off the beaten path, it will still be a good idea to keep an eye on how we improve or worsen perf, when we play with different locking strategies (which I intend to do a lot of in the near term). Test Plan: ============================================================================ folly/wangle/test/Benchmark.cpp relative time/iter iters/s ============================================================================ constantFuture 249.31ns 4.01M promiseAndFuture 100.88% 247.13ns 4.05M withThen 43.87% 568.30ns 1.76M ---------------------------------------------------------------------------- oneThen 569.62ns 1.76M twoThens 63.46% 897.60ns 1.11M fourThens 39.64% 1.44us 695.90K hundredThens 1.91% 29.75us 33.61K ---------------------------------------------------------------------------- serial 4.97ms 201.21 parallel 60.22% 8.25ms 121.18 ============================================================================ Reviewed By: jsedgwick@fb.com Subscribers: meisner, net-systems@, fugalh, exa, njormrod FB internal diff: D1593010
d6488e3c · Hans Fugal · Andrii Grynenko · d3ee675f · d6488e3c
Commit d6488e3c authored Oct 07, 2014 by Hans Fugal Committed by Andrii Grynenko Oct 14, 2014
Hide whitespace changes
Inline Side-by-side

Showing with 86 additions and 0 deletions

folly/wangle/test/Benchmark.cpp folly/wangle/test/Benchmark.cpp +86 -0

No files found.
--- a/folly/wangle/test/Benchmark.cpp
+++ b/folly/wangle/test/Benchmark.cpp
@@ -14,11 +14,15 @@
 * limitations under the License.
 */
+#include <folly/Baton.h>
 #include <folly/Benchmark.h>
 #include <folly/wangle/Future.h>
 #include <folly/wangle/Promise.h>
+#include <semaphore.h>
+#include <vector>
 using namespace folly::wangle;
+using namespace std;
 template <class T>
 T incr(Try<T>&& t) {
@@ -74,6 +78,88 @@ BENCHMARK_RELATIVE(hundredThens) {
  someThens(100);
 }
+// Lock contention. Although in practice fulfil()s tend to be temporally
+// separate from then()s, still sometimes they will be concurrent. So the
+// higher this number is, the better.
+BENCHMARK_DRAW_LINE()
+BENCHMARK(no_contention) {
+  vector<Promise<int>> promises(10000);
+  vector<Future<int>> futures;
+  std::thread producer, consumer;
+  BENCHMARK_SUSPEND {
+    folly::Baton<> b1, b2;
+    for (auto& p : promises)
+      futures.push_back(p.getFuture());
+    consumer = std::thread([&]{
+      b1.post();
+      for (auto& f : futures) f.then(incr<int>);
+    });
+    consumer.join();
+    producer = std::thread([&]{
+      b2.post();
+      for (auto& p : promises) p.setValue(42);
+    });
+    b1.wait();
+    b2.wait();
+  }
+  // The only thing we are measuring is how long fulfil + callbacks take
+  producer.join();
+}
+BENCHMARK_RELATIVE(contention) {
+  vector<Promise<int>> promises(10000);
+  vector<Future<int>> futures;
+  std::thread producer, consumer;
+  sem_t sem;
+  sem_init(&sem, 0, 0);
+  BENCHMARK_SUSPEND {
+    folly::Baton<> b1, b2;
+    for (auto& p : promises)
+      futures.push_back(p.getFuture());
+    consumer = std::thread([&]{
+      b1.post();
+      for (auto& f : futures) {
+        sem_wait(&sem);
+        f.then(incr<int>);
+      }
+    });
+    producer = std::thread([&]{
+      b2.post();
+      for (auto& p : promises) {
+        sem_post(&sem);
+        p.setValue(42);
+      }
+    });
+    b1.wait();
+    b2.wait();
+  }
+  // The astute reader will notice that we're not *precisely* comparing apples
+  // to apples here. Well, maybe it's like comparing Granny Smith to
+  // Braeburn or something. In the serial version, we waited for the futures
+  // to be all set up, but here we are probably still doing that work
+  // (although in parallel). But even though there is more work (on the order
+  // of 2x), it is being done by two threads. Hopefully most of the difference
+  // we see is due to lock contention and not false parallelism.
+  //
+  // Be warned that if the box is under heavy load, this will greatly skew
+  // these results (scheduling overhead will begin to dwarf lock contention).
+  // I'm not sure but I'd guess in Windtunnel this will mean large variance,
+  // because I expect they load the boxes as much as they can?
+  consumer.join();
+  producer.join();
+}
 int main() {
  folly::runBenchmarks();
 }