Fix a ThreadLocal bug: hold the meta lock when resizing the element vector
Summary: There appears to be a race here. leizha reported issues with a heavily recycled AtomicHashMap (ThreadCachedInt inside). It looks like what's happening is this: - Thread A: ~ThreadCachedInt from an AHM - meta lock is taken, and the ThreadElement list is iterated - all entries are zerod, and the id is marked free - then releases the lock - Thread B: someone is calling get() on an unrelated id - hit reserve: rallocm on the pointer or unsynchronized memcpy from the element vector - waits on the lock - when it gets the lock, it stores back the value that it read that was zero'd by A. Later, someone reuses the id from the freelist, and reuses the previously freed pointer, and eventually double-freeing it. (nullptr is the signifier for "this thread doesn't have an instance of the threadlocal yet"). Test Plan: leizha's test case doesn't segv after this diff---it was reliably breaking with corruption in malloc before it. I'm working on making that test case into a unit test to add to this diff, but I'm putting it up early in case there's something wrong with the theory above or in case someone has an idea for a better fix. Reviewed By: tudorb@fb.com FB internal diff: D928534
Showing
Please register or sign in to comment