Improve fast path of Cursor
Summary: This change simplifies the fastpath by reducing it to bare minimum (i.e. check length, load data) and removes indirection to IOBuf. Additionally it adds `skipNoAdvance` method to have 1-instruction skip. Disassembly of `read<signed char>` is over 35 instructions (just hot path). With this change it's doesn to 8. Disassembly after: Dump of assembler code for function folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::read<unsigned char>(): 0x000000000041f0f0 <+0>: mov 0x18(%rdi),%rax 0x000000000041f0f4 <+4>: lea 0x1(%rax),%rcx 0x000000000041f0f8 <+8>: cmp 0x10(%rdi),%rcx 0x000000000041f0fc <+12>: ja 0x41f105 <folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::read<unsigned char>()+21> 0x000000000041f0fe <+14>: mov (%rax),%al 0x000000000041f100 <+16>: mov %rcx,0x18(%rdi) 0x000000000041f104 <+20>: retq 0x000000000041f105 <+21>: jmpq 0x41f110 <folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::readSlow<unsigned char>()> With this diff Thrift deserialization becomes ~20% faster (with prod workloads). Thrift benchmark: Before: ============================================================================ thrift/lib/cpp2/test/ProtocolBench.cpp relative time/iter iters/s ============================================================================ BinaryProtocol_read_Empty 12.98ns 77.03M BinaryProtocol_read_SmallInt 20.94ns 47.76M BinaryProtocol_read_BigInt 20.86ns 47.93M BinaryProtocol_read_SmallString 34.64ns 28.86M BinaryProtocol_read_BigString 185.53ns 5.39M BinaryProtocol_read_BigBinary 67.34ns 14.85M BinaryProtocol_read_LargeBinary 62.23ns 16.07M BinaryProtocol_read_Mixed 58.74ns 17.03M BinaryProtocol_read_SmallListInt 89.99ns 11.11M BinaryProtocol_read_BigListInt 39.92us 25.05K BinaryProtocol_read_BigListMixed 616.20us 1.62K BinaryProtocol_read_LargeListMixed 83.49ms 11.98 CompactProtocol_read_Empty 11.28ns 88.67M CompactProtocol_read_SmallInt 19.15ns 52.22M CompactProtocol_read_BigInt 26.14ns 38.25M CompactProtocol_read_SmallString 31.04ns 32.22M CompactProtocol_read_BigString 184.55ns 5.42M CompactProtocol_read_BigBinary 69.73ns 14.34M CompactProtocol_read_LargeBinary 64.39ns 15.53M CompactProtocol_read_Mixed 58.73ns 17.03M CompactProtocol_read_SmallListInt 76.50ns 13.07M CompactProtocol_read_BigListInt 25.93us 38.56K CompactProtocol_read_BigListMixed 623.15us 1.60K CompactProtocol_read_LargeListMixed 80.57ms 12.41 ============================================================================ After: ============================================================================ thrift/lib/cpp2/test/ProtocolBench.cpp relative time/iter iters/s ============================================================================ BinaryProtocol_read_Empty 10.40ns 96.17M BinaryProtocol_read_SmallInt 15.14ns 66.03M BinaryProtocol_read_BigInt 15.19ns 65.84M BinaryProtocol_read_SmallString 25.19ns 39.70M BinaryProtocol_read_BigString 172.85ns 5.79M BinaryProtocol_read_BigBinary 56.88ns 17.58M BinaryProtocol_read_LargeBinary 56.77ns 17.61M BinaryProtocol_read_Mixed 43.98ns 22.74M BinaryProtocol_read_SmallListInt 58.19ns 17.19M BinaryProtocol_read_BigListInt 19.75us 50.63K BinaryProtocol_read_BigListMixed 440.20us 2.27K BinaryProtocol_read_LargeListMixed 56.94ms 17.56 CompactProtocol_read_Empty 9.35ns 106.93M CompactProtocol_read_SmallInt 13.07ns 76.49M CompactProtocol_read_BigInt 18.23ns 54.87M CompactProtocol_read_SmallString 25.61ns 39.05M CompactProtocol_read_BigString 174.46ns 5.73M CompactProtocol_read_BigBinary 59.77ns 16.73M CompactProtocol_read_LargeBinary 60.81ns 16.44M CompactProtocol_read_Mixed 42.70ns 23.42M CompactProtocol_read_SmallListInt 66.89ns 14.95M CompactProtocol_read_BigListInt 25.08us 39.87K CompactProtocol_read_BigListMixed 427.93us 2.34K CompactProtocol_read_LargeListMixed 56.11ms 17.82 ============================================================================ Reviewed By: yfeldblum Differential Revision: D6635325 fbshipit-source-id: 393fc1005689042977c03f37f5a898ebe7814d44
Showing
Please register or sign in to comment