• James Sedgwick's avatar
    various perf improvements · ca1e87ed
    James Sedgwick authored
    Summary: Three strategies
    1. Optimistic locking
    2. Acquire-release memory ordering instead of full sequential consistency
    3. Some low-hanging branch miss optimizations
    
    Please review carefully; the dogscience is strong with this one
    
    ```
    Before:
    
    ============================================================================
    folly/futures/test/Benchmark.cpp                relative  time/iter  iters/s
    ============================================================================
    constantFuture                                             127.99ns    7.81M
    promiseAndFuture                                  94.89%   134.89ns    7.41M
    withThen                                          28.40%   450.63ns    2.22M
    ----------------------------------------------------------------------------
    oneThen                                                    446.68ns    2.24M
    twoThens                                          58.35%   765.55ns    1.31M
    fourThens                                         31.87%     1.40us  713.41K
    hundredThens                                       1.61%    27.78us   35.99K
    ----------------------------------------------------------------------------
    no_contention                                                4.63ms   216.00
    contention                                        80.79%     5.73ms   174.52
    ----------------------------------------------------------------------------
    throwAndCatch                                               10.91us   91.64K
    throwAndCatchWrapped                             127.14%     8.58us  116.51K
    throwWrappedAndCatch                             178.22%     6.12us  163.32K
    throwWrappedAndCatchWrapped                      793.75%     1.37us  727.38K
    ----------------------------------------------------------------------------
    throwAndCatchContended                                        1.35s  741.33m
    throwAndCatchWrappedContended                    139.18%   969.23ms     1.03
    throwWrappedAndCatchContended                    169.51%   795.76ms     1.26
    throwWrappedAndCatchWrappedContended            17742.23%     7.60ms   131.53
    ----------------------------------------------------------------------------
    complexUnit                                                127.50us    7.84K
    complexBlob4                                     100.14%   127.32us    7.85K
    complexBlob8                                     100.16%   127.30us    7.86K
    complexBlob64                                     96.45%   132.19us    7.57K
    complexBlob128                                    92.83%   137.35us    7.28K
    complexBlob256                                    87.79%   145.23us    6.89K
    complexBlob512                                    81.64%   156.18us    6.40K
    complexBlob1024                                   72.54%   175.76us    5.69K
    complexBlob2048                                   58.52%   217.89us    4.59K
    complexBlob4096                                   32.54%   391.78us    2.55K
    ============================================================================
    
    After:
    ============================================================================
    folly/futures/test/Benchmark.cpp                relative  time/iter  iters/s
    ============================================================================
    constantFuture                                              85.28ns   11.73M
    promiseAndFuture                                  88.63%    96.22ns   10.39M
    withThen                                          30.46%   279.99ns    3.57M
    ----------------------------------------------------------------------------
    oneThen                                                    231.18ns    4.33M
    twoThens                                          60.57%   381.70ns    2.62M
    fourThens                                         33.52%   689.71ns    1.45M
    hundredThens                                       1.49%    15.48us   64.58K
    ----------------------------------------------------------------------------
    no_contention                                                3.84ms   260.19
    contention                                        88.29%     4.35ms   229.73
    ----------------------------------------------------------------------------
    throwAndCatch                                               10.63us   94.06K
    throwAndCatchWrapped                             127.17%     8.36us  119.61K
    throwWrappedAndCatch                             179.83%     5.91us  169.15K
    throwWrappedAndCatchWrapped                     1014.48%     1.05us  954.19K
    ----------------------------------------------------------------------------
    throwAndCatchContended                                        1.34s  749.03m
    throwAndCatchWrappedContended                    140.66%   949.16ms     1.05
    throwWrappedAndCatchContended                    164.87%   809.77ms     1.23
    throwWrappedAndCatchWrappedContended            49406.39%     2.70ms   370.07
    ----------------------------------------------------------------------------
    complexUnit                                                 86.83us   11.52K
    complexBlob4                                      97.42%    89.12us   11.22K
    complexBlob8                                      96.63%    89.85us   11.13K
    complexBlob64                                     92.53%    93.84us   10.66K
    complexBlob128                                    90.85%    95.57us   10.46K
    complexBlob256                                    82.56%   105.17us    9.51K
    complexBlob512                                    74.13%   117.12us    8.54K
    complexBlob1024                                   63.67%   136.37us    7.33K
    complexBlob2048                                   50.25%   172.79us    5.79K
    complexBlob4096                                   26.63%   326.05us    3.07K
    ============================================================================
    ```
    
    Reviewed By: @djwatson
    
    Differential Revision: D2139822
    ca1e87ed
Try-inl.h 3.89 KB