Commit 3c34f066 authored by Nick Burrett's avatar Nick Burrett Committed by Dave Watson

folly::EventBase: wrap libevent calls to prevent race-condition

Summary:
Patch D1585087 exposes two flaws in EventBase().  It introduces IO
worker threads to the ThriftServer which are constructed/destructed in
parallel.

Within the construction phase, a new EventBase() is instantiated for
each thread and unwound in destruction.

When using the BaseControllerTask (in Python), the following sequence
is observed:

a = event_init() [ThriftServer]
b = event_init() [IO worker 1]
c = event_init() [IO worker 2]
...
event_base_free(c)
event_base_free(b)
event_base_free(a)  -> segfault

1. event_init() should only ever be called once.  It internally
modifies a global variable in libevent, current_base to match the
return value.  event_base_free() will set current_base back to NULL if
the passed in arg matches current_base.   Therefore subsequent calls
must use event_base_new().

2. Since current_base is a global and EventBase() is called by multiple
threads, it is important to guard with a mutex.  The guard itself also
exposed the bug because:

a = event_init()  [current_base = a]
b = event_init()  [current_base = b]
...
event_base_free(b) [b == current_base -> current_base = NULL]

So current_base ends up prematurely set to NULL.

Test Plan:
Run dba/core/daemons/dbstatus/dbstatus_tests.lpar, which no longer
segfaults

Reviewed By: jsedgwick@fb.com, davejwatson@fb.com

Subscribers: dihde, evanelias, trunkagent, njormrod, ncoffield, lachlan, folly-diffs@

FB internal diff: D1663654

Tasks: 5545819

Signature: t1:1663654:1415732265:d51c4c4cae99c1ac371460bf18d26d4f917a3c52

Blame Revision: D1585087
parent 05e80bad
......@@ -125,6 +125,16 @@ void EventBase::CobTimeout::timeoutExpired() noexcept {
delete this;
}
// The interface used to libevent is not thread-safe. Calls to
// event_init() and event_base_free() directly modify an internal
// global 'current_base', so a mutex is required to protect this.
//
// event_init() should only ever be called once. Subsequent calls
// should be made to event_base_new(). We can recognise that
// event_init() has already been called by simply inspecting current_base.
static std::mutex libevent_mutex_;
/*
* EventBase methods
*/
......@@ -133,7 +143,6 @@ EventBase::EventBase()
: runOnceCallbacks_(nullptr)
, stop_(false)
, loopThread_(0)
, evb_(static_cast<event_base*>(event_init()))
, queue_(nullptr)
, fnRunner_(nullptr)
, maxLatency_(0)
......@@ -144,6 +153,18 @@ EventBase::EventBase()
, startWork_(0)
, observer_(nullptr)
, observerSampleCount_(0) {
{
std::lock_guard<std::mutex> lock(libevent_mutex_);
// The value 'current_base' (libevent 1) or
// 'event_global_current_base_' (libevent 2) is filled in by event_set(),
// allowing examination of its value without an explicit reference here.
// If ev.ev_base is NULL, then event_init() must be called, otherwise
// call event_base_new().
struct event ev;
event_set(&ev, 0, 0, nullptr, nullptr);
evb_ = (ev.ev_base) ? event_base_new() : event_init();
}
if (UNLIKELY(evb_ == nullptr)) {
LOG(ERROR) << "EventBase(): Failed to init event base.";
folly::throwSystemError("error in EventBase::EventBase()");
......@@ -202,7 +223,10 @@ EventBase::~EventBase() {
// Stop consumer before deleting NotificationQueue
fnRunner_->stopConsuming();
event_base_free(evb_);
{
std::lock_guard<std::mutex> lock(libevent_mutex_);
event_base_free(evb_);
}
VLOG(5) << "EventBase(): Destroyed.";
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment