• Philip Pronin's avatar
    improve io::Cursor read() performance for small sizeof(T) · 173356a3
    Philip Pronin authored
    Summary:
    I just found that gcc (4.8.2) failed to unroll the loop in
    `pullAtMost()`, so it didn't replace `memcpy` with a simple load
    for small `len`.
    
    Test Plan:
    fbconfig -r folly/io/test thrift/lib/cpp2/test && fbmake runtests_opt -j32
    
    Ran unicorn-specific thrift deserialization benchmark from
    D1724070, verified 50% improvement in `SearchRequest` deserialization
    performance.
    
    `thrift/lib/cpp2/test/ProtocolBench` results:
    
    ```
    |---- before -----| |---- after  -----|
    ================================================================================================
    thrift/lib/cpp2/test/ProtocolBench.cpp          relative  time/iter  iters/s  time/iter  iters/s
    ================================================================================================
    BinaryProtocol_read_Empty                                   21.72ns   46.04M    17.58ns   56.89M
    BinaryProtocol_read_SmallInt                                43.03ns   23.24M    23.64ns   42.30M
    BinaryProtocol_read_BigInt                                  43.72ns   22.87M    22.03ns   45.38M
    BinaryProtocol_read_SmallString                             88.57ns   11.29M    47.01ns   21.27M
    BinaryProtocol_read_BigString                              365.76ns    2.73M   323.58ns    3.09M
    BinaryProtocol_read_BigBinary                              207.78ns    4.81M   169.09ns    5.91M
    BinaryProtocol_read_LargeBinary                            187.81ns    5.32M   172.09ns    5.81M
    BinaryProtocol_read_Mixed                                  161.18ns    6.20M    68.41ns   14.62M
    BinaryProtocol_read_SmallListInt                           177.32ns    5.64M    96.91ns   10.32M
    BinaryProtocol_read_BigListInt                              77.03us   12.98K    15.88us   62.97K
    BinaryProtocol_read_BigListMixed                             1.79ms   557.79   923.99us    1.08K
    BinaryProtocol_read_LargeListMixed                         195.01ms     5.13   103.78ms     9.64
    ================================================================================================
    ```
    
    Reviewed By: soren@fb.com
    
    Subscribers: alandau, bmatheny, mshneer, trunkagent, njormrod, folly-diffs@
    
    FB internal diff: D1724111
    
    Tasks: 5770136
    
    Signature: t1:1724111:1417977810:b7d643d0c819a0bbac77fa0048206153929e50a8
    173356a3
Compression.cpp 25.5 KB