various perf improvements
Summary: Three strategies 1. Optimistic locking 2. Acquire-release memory ordering instead of full sequential consistency 3. Some low-hanging branch miss optimizations Please review carefully; the dogscience is strong with this one ``` Before: ============================================================================ folly/futures/test/Benchmark.cpp relative time/iter iters/s ============================================================================ constantFuture 127.99ns 7.81M promiseAndFuture 94.89% 134.89ns 7.41M withThen 28.40% 450.63ns 2.22M ---------------------------------------------------------------------------- oneThen 446.68ns 2.24M twoThens 58.35% 765.55ns 1.31M fourThens 31.87% 1.40us 713.41K hundredThens 1.61% 27.78us 35.99K ---------------------------------------------------------------------------- no_contention 4.63ms 216.00 contention 80.79% 5.73ms 174.52 ---------------------------------------------------------------------------- throwAndCatch 10.91us 91.64K throwAndCatchWrapped 127.14% 8.58us 116.51K throwWrappedAndCatch 178.22% 6.12us 163.32K throwWrappedAndCatchWrapped 793.75% 1.37us 727.38K ---------------------------------------------------------------------------- throwAndCatchContended 1.35s 741.33m throwAndCatchWrappedContended 139.18% 969.23ms 1.03 throwWrappedAndCatchContended 169.51% 795.76ms 1.26 throwWrappedAndCatchWrappedContended 17742.23% 7.60ms 131.53 ---------------------------------------------------------------------------- complexUnit 127.50us 7.84K complexBlob4 100.14% 127.32us 7.85K complexBlob8 100.16% 127.30us 7.86K complexBlob64 96.45% 132.19us 7.57K complexBlob128 92.83% 137.35us 7.28K complexBlob256 87.79% 145.23us 6.89K complexBlob512 81.64% 156.18us 6.40K complexBlob1024 72.54% 175.76us 5.69K complexBlob2048 58.52% 217.89us 4.59K complexBlob4096 32.54% 391.78us 2.55K ============================================================================ After: ============================================================================ folly/futures/test/Benchmark.cpp relative time/iter iters/s ============================================================================ constantFuture 85.28ns 11.73M promiseAndFuture 88.63% 96.22ns 10.39M withThen 30.46% 279.99ns 3.57M ---------------------------------------------------------------------------- oneThen 231.18ns 4.33M twoThens 60.57% 381.70ns 2.62M fourThens 33.52% 689.71ns 1.45M hundredThens 1.49% 15.48us 64.58K ---------------------------------------------------------------------------- no_contention 3.84ms 260.19 contention 88.29% 4.35ms 229.73 ---------------------------------------------------------------------------- throwAndCatch 10.63us 94.06K throwAndCatchWrapped 127.17% 8.36us 119.61K throwWrappedAndCatch 179.83% 5.91us 169.15K throwWrappedAndCatchWrapped 1014.48% 1.05us 954.19K ---------------------------------------------------------------------------- throwAndCatchContended 1.34s 749.03m throwAndCatchWrappedContended 140.66% 949.16ms 1.05 throwWrappedAndCatchContended 164.87% 809.77ms 1.23 throwWrappedAndCatchWrappedContended 49406.39% 2.70ms 370.07 ---------------------------------------------------------------------------- complexUnit 86.83us 11.52K complexBlob4 97.42% 89.12us 11.22K complexBlob8 96.63% 89.85us 11.13K complexBlob64 92.53% 93.84us 10.66K complexBlob128 90.85% 95.57us 10.46K complexBlob256 82.56% 105.17us 9.51K complexBlob512 74.13% 117.12us 8.54K complexBlob1024 63.67% 136.37us 7.33K complexBlob2048 50.25% 172.79us 5.79K complexBlob4096 26.63% 326.05us 3.07K ============================================================================ ``` Reviewed By: @djwatson Differential Revision: D2139822
Showing
Please register or sign in to comment