Improve the SingletonThreadLocal fast path
Summary: [Folly] Improve the `SingletonThreadLocal` fast path. Principally, by having the thread-local cache checked checked before the static-local guard variable is checked, rather than after. This change measurably improves the `EventBase` benchmark: ```name=branch ============================================================================ folly/io/async/test/EventBaseBenchmark.cpp relative time/iter iters/s ============================================================================ timeMeasurementsOn 1.02us 980.21K timeMeasurementsOff 251.64% 405.41ns 2.47M ============================================================================ ``` ```name=master ============================================================================ folly/io/async/test/EventBaseBenchmark.cpp relative time/iter iters/s ============================================================================ timeMeasurementsOn 1.03us 969.51K timeMeasurementsOff 247.08% 417.45ns 2.40M ============================================================================ ``` This change shortens the fast-path in `folly::RequestContext::getStaticContext()` (manually cleaned up): ```name=branch ---- fast path ---- <+0>: mov rax,QWORD PTR fs:folly::SingletonThreadLocal<...>::get()::cache@tpoff <+9>: test rax,rax <+12>: je folly::RequestContext::getStaticContext()+16 <+14>: ret <+15>: nop ---- slow path ---- <+16>: push rbp <+17>: mov rbp,rsp <+20>: call folly::SingletonThreadLocal<...>::getWrapperOutline() <+25>: mov rdx,QWORD PTR fs:0x0 <+34>: mov QWORD PTR fs:folly::SingletonThreadLocal<...>::get()::cache@tpoff,rax <+43>: add rdx,OFFSET FLAT:folly::SingletonThreadLocal<...>::get()::cache@tpoff <+50>: mov QWORD PTR [rax+0x10],rdx <+54>: pop rbp <+55>: ret ``` ```name=master ---- fast path ---- <+0>: push rbp <+1>: mov rbp,rsp <+4>: push rbx <+5>: sub rsp,0x8 <+9>: cmp BYTE PTR guard variable for folly::RequestContext::getStaticContext()::singleton[rip],0x0 <+16>: je folly::RequestContext::getStaticContext()+48 <+18>: mov rax,QWORD PTR fs:folly::SingletonThreadLocal<...>::get()::cache@tpoff <+27>: test rax,rax <+30>: je folly::RequestContext::getStaticContext()+96 <+32>: mov rbx,QWORD PTR [rbp-0x8] <+36>: leave <+37>: ret <+38>: nop WORD PTR cs:[rax+rax*1+0x0] ---- slow path ---- <+48>: mov edi,OFFSET FLAT:guard variable for folly::RequestContext::getStaticContext()::singleton <+53>: call __cxa_guard_acquire <+58>: test eax,eax <+60>: je folly::RequestContext::getStaticContext()+18 <+62>: sub rsp,0x8 <+66>: mov edi,OFFSET FLAT:folly::RequestContext::getStaticContext()::singleton <+71>: push 0x0 <+73>: call folly::SingletonThreadLocal<...>::SingletonThreadLocal({lambda()}) <+78>: pop rax <+79>: mov edi,OFFSET FLAT:guard variable for folly::RequestContext::getStaticContext()::singleton <+84>: pop rdx <+85>: call __cxa_guard_release <+90>: jmp folly::RequestContext::getStaticContext()+18 <+92>: nop DWORD PTR [rax+0x0] <+96>: call folly::SingletonThreadLocal<...>::getSlow() <+101>: mov rbx,QWORD PTR [rbp-0x8] <+105>: mov QWORD PTR fs:folly::SingletonThreadLocal<...>::get()::cache@tpoff,rax <+114>: leave <+115>: ret <+116>: mov rbx,rax <+119>: mov edi,OFFSET FLAT:guard variable for folly::RequestContext::getStaticContext()::singleton <+124>: call __cxa_guard_abort <+129>: mov rdi,rbx <+132>: call _Unwind_Resume ``` Reviewed By: andriigrynenko Differential Revision: D6763655 fbshipit-source-id: 6f2d317ffd40a4e1f143b4bbbd087e85cc667b8c
Showing
Please register or sign in to comment