Commit 8d329050 authored by Qi Wang's avatar Qi Wang Committed by Facebook Github Bot 1

Try using the last Deferred reader slot first

Summary:
When trying to find an empty deferred reader slot, getting the current CPU can
take quite a few cycles, e.g. >1% CPU on SMC (https://fburl.com/434646643).

Let's track the last slot used by this thread and try that slot first before reading
CPU id and doing the search.

u-benchmark results seem to be improving generally (though a bit noisy and not
sure how much to trust). Results w/ this diff on left side:
P56648675

Reviewed By: nbronson

Differential Revision: D3857793

fbshipit-source-id: 8b1c005362c82e748a663100f889b0b99dc257fe
parent 5bebf3c9
......@@ -723,6 +723,10 @@ class SharedMutexImpl {
// This is the starting location for Token-less unlock_shared().
static FOLLY_SHAREDMUTEX_TLS uint32_t tls_lastTokenlessSlot;
// Last deferred reader slot used.
static FOLLY_SHAREDMUTEX_TLS uint32_t tls_lastDeferredReaderSlot;
// Only indexes divisible by kDeferredSeparationFactor are used.
// If any of those elements points to a SharedMutexImpl, then it
// should be considered that there is a shared lock on that instance.
......@@ -1343,6 +1347,15 @@ FOLLY_SHAREDMUTEX_TLS uint32_t
SharedMutexImpl<ReaderPriority, Tag_, Atom, BlockImmediately>::
tls_lastTokenlessSlot = 0;
template <
bool ReaderPriority,
typename Tag_,
template <typename> class Atom,
bool BlockImmediately>
FOLLY_SHAREDMUTEX_TLS uint32_t
SharedMutexImpl<ReaderPriority, Tag_, Atom, BlockImmediately>::
tls_lastDeferredReaderSlot = 0;
template <
bool ReaderPriority,
typename Tag_,
......@@ -1385,6 +1398,10 @@ bool SharedMutexImpl<ReaderPriority, Tag_, Atom, BlockImmediately>::
(state & kHasS) >= (kNumSharedToStartDeferring - 1) * kIncrHasS;
bool drainInProgress = ReaderPriority && (state & kBegunE) != 0;
if (canAlreadyDefer || (aboveDeferThreshold && !drainInProgress)) {
/* Try using the most recent slot first. */
slot = tls_lastDeferredReaderSlot;
slotValue = deferredReader(slot)->load(std::memory_order_relaxed);
if (slotValue != 0) {
// starting point for our empty-slot search, can change after
// calling waitForZeroBits
uint32_t bestSlot =
......@@ -1399,10 +1416,12 @@ bool SharedMutexImpl<ReaderPriority, Tag_, Atom, BlockImmediately>::
slotValue = deferredReader(slot)->load(std::memory_order_relaxed);
if (slotValue == 0) {
// found empty slot
tls_lastDeferredReaderSlot = slot;
break;
}
}
}
}
if (slotValue != 0) {
// not yet deferred, or no empty slots
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment