Fix a TSAN-reported race condition in ThreadLocal.
Summary: This is a pretty cryptic race condition, but it works like this: Each thread local instance (i.e. one ThreadLocalPtr) is assigned a unique ID by the static meta. This ID is freed up when the thread local is destroyed, and may then be reused by another newly created thread local. For each thread we keep an array of elements (ElementWrapper), indexed by the thread local ID. Each element has some metadata such as the pointer to the object being cached for that thread and that thread local. There are a few cases where we access a ElementWrapper object: - We write to it under a global lock, when destroying a thread local object (only the i-th element is written to for each thread, if the ID is 'i' for the thread local). - We write to it under a global lock, when reallocating the array of elements for the thread if a new thread local ID exceeds the capacity of the array; this can happen when we are reading the thread local for a thread. - We write to it under a global lock when the corresponding thread is destroyed. - We read from it, *without* any locks, when reading a thread local's value from a thread and the above reallocation case does not occur. The race condition happens in this case: - Thread locals A and B are created - Thread local A is accessed by thread Z, and gets, say ID = 5, the corresponding element E is initialized - Thread local A is destroyed by thread X, and under the global lock, the element E is written to (setting element->ptr = nullptr); ID 5 is released to the list of free IDs. - Thread local B is accessed for the first time by thread Y, and therefore needs to grab an ID; under the global lock, it happens to get ID = 5, and stores that ID into the thread local instance as an atomic int; the store here uses std::memory_order_seq_cst. - Thread local B is then accessed again from thread Z; this time the atomic int ID is loaded with std::memory_order_relaxed, with the resulting value of 5; we then proceed to read element E's ptr value *without any locks*. TSAN reports the last read of element E to be racy with the previous write of element E during thread X's destruction of thread local A. Even though this happened with a mutex operation in between, the final std::memory_order_relaxed causes TSAN to think that the read access was not properly synchronized after the earlier write. This conjecture is confirmed by the following small reproducible test case: ``` TEST(MyTest, Test) { std::mutex mutex; int x = 0; bool written = false; std::atomic<bool> seen{false}; std::thread t0([&] { while (!seen.load(std::memory_order_relaxed)) { } std::cout << x; }); std::thread t1([&] { while (true) { std::lock_guard<std::mutex> guard(mutex); if (written) { seen.store(true, std::memory_order_release); break; } } }); std::thread t2([&] { std::lock_guard<std::mutex> guard(mutex); x = 1; written = true; }); t0.join(); t1.join(); t2.join(); } ``` (This example, when run under TSAN, reports a race condition between "x = 1" and "std::cout << x".) This change makes the load use std::memory_order_acquire so it synchronizes with the store. Should have no performance implications on x86 but I'm not totally confident about that. While we're at it, also changing the allocate() and destroy() functions to use more relaxed memory orders; sequential consistency is unnecessary in these places. Reviewed By: yfeldblum Differential Revision: D22561685 fbshipit-source-id: 0d53925486ef78139ffe71681827b0c0959dd29e
Showing
Please register or sign in to comment