Optimizations, support for exceptions and big return values for lock_combine
Summary: Some optimizations and changes to make mutex migrations easier: - Add exception handling support, this allows using lock_combine pretty much anywhere a unique_lock would be used and makes transitioning between lock methods easier and more efficient as users aren't required to maintain their own unions anywhere (eg. with folly::Try) - Add support for big return values so people can return anything from the critical section. Without this, users would have to use code of the following form, which is prone to false sharing with metadata for the waiting thread ``` auto value = ReturnValue{}; mutex.lock_combine([&]() { value = critical_section(); }); ``` - Add some optimizations like inlining the combine codepath and an optimistic load to elide a branch. This gets us a ~8% throughput improvement from before. More importantly, This prevents compilers from messing up the generated code to dereference the waiter node whenever they feel like. - Defer time publishing for combinable threads until a preemption. This gets us to the same level of efficiency as std::atomic even on broadwell, takes us to 7x of the baseline (std::mutex) on the NUMA-less machines and almost perfectly parallel in the moderate concurrency levels. I suspect we can do better with NUMA-awareness, but that's for another diff Reviewed By: yfeldblum Differential Revision: D15522658 fbshipit-source-id: 420f4202503305d57b6bd59a9a4ecb67d4dd3c2e
Showing
Please register or sign in to comment