Attempt to fix a race condition in ThreadLocal related to ThreadEntryNode.
Summary: See task T70610458 description for the TSAN error reported. The scenario that can trigger this race condition is: * Thread A has a ThreadEntryNode N whose `next` points to thread B's ThreadEntry. * Thread B exits, which causes it to delete its ThreadEntry and all its nodes; therefore writing `N->next = <the next entry after B>`. * At the same time, thread A accesses the thread local corresponding to N; when checking whether the ThreadEntryNode is already added to the linked list, it checks whether the `next` pointer is null; this read races with the above write. Even though this is a technically a race condition, it is unlikely to actually trigger any negative effect in practice, because at thread exit, the overwrite of `N->next = ...` will never assign a nullptr, so it will never affect the outcome of `if (!next)` in thread A. Only the thread that owns the ThreadEntryNode has the ability to change between `next == nullptr` and `next != nullptr`. (The former is called "zero" in the code). Nonetheless, C++ says "race condition leads to undefined behavior", so it's still good to fix... It's not clear what the best way to fix this is. Here I've presented one fix, which is to introduce another field in ThreadEntryNode that is private to the thread and simply indicates whether the node is zero; this field is strictly accessed by only the owning thread and therefore will not cause a race condition. I used a bit in `id` to not increase memory consumption, but maybe that's not important - I don't know. I'm also unsure about the performance implications. Let me know if anyone has different ideas; I'm very open to suggestions here. Reviewed By: yfeldblum Differential Revision: D22743031 fbshipit-source-id: 12722ca241a224b546a0f6763b125e922813c844
Showing
Please register or sign in to comment