1. 09 Jul, 2021 2 commits
    • Tim Berning's avatar
      fix flaky test · 52c78089
      Tim Berning authored
      Summary: AsyncSocketSendmmsgIntegrationTest.PingPongRequest fails during stress runs with multiple jobs due to the 5ms timeout in `UDPClient::sendPing()`. Increasing the timeout fixes the issue.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29623870
      
      fbshipit-source-id: ad6ee0356001fbded9ce01ddc099b708d1477899
      52c78089
    • Pranjal Raihan's avatar
      RequestContext::StaticContextAccessor · bc0818f8
      Pranjal Raihan authored
      Summary: `RequestContext::StaticContextAccessor` acts as a guard, preventing all threads with a `StaticContext` from being destroyed (or created).
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29536635
      
      fbshipit-source-id: 008a612ec65aa7095450f4d0ab522e8ecd864548
      bc0818f8
  2. 08 Jul, 2021 8 commits
    • Giuseppe Ottaviano's avatar
      Optimize small_vector copy and move for inline storage · cb36f3c8
      Giuseppe Ottaviano authored
      Summary:
      Codegen for copy and move assignment is suboptimal when the vectors involved are inline, in particular if the destination is default-constructed (which is very common) and even more so if the value type is trivially copyable.
      
      The main optimization is that if the storage is inline and the type is trivially-copyable, we can just copy the whole storage, regardless of the size of the container. This avoids a branchy loop, and it's usually just a handful of MOVs.
      
      While at it, optimized all the related code paths by avoiding delegating to `swap()` and `assign()` when possible, which introduce further branches that are hard to optimize. Also move and copy in place when the vector is non-empty, avoiding unnecessary destructions.
      
      Also fixed a couple of bugs:
      - The move constructor did not clear its argument when inline, inconsistently with the move assignment and `std::vector`
      - The move assignment operator passed its heap storage to the argument instead of freeing it.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29592519
      
      fbshipit-source-id: 6281cdda775568d28619afea8b7ca2fb168c7e5d
      cb36f3c8
    • Yedidya Feldblum's avatar
      cut Tearable · c226674c
      Yedidya Feldblum authored
      Summary: It is unused. An upcoming Concurrency TS will have a replacement.
      
      Reviewed By: mshneer
      
      Differential Revision: D29597486
      
      fbshipit-source-id: a108b945ce32eb17cedefad89630c9171fc5c9c2
      c226674c
    • Fred Emmott's avatar
      Make `travis_docker_build.sh` macos-compatible · cb55944f
      Fred Emmott authored
      Summary:
      BSD readlink doesn't have -f.
      
      I'm not using TravisCI, however docker is still convenient for reproducing builds locally.
      
      Reviewed By: yns88
      
      Differential Revision: D29523333
      
      fbshipit-source-id: e01169f3eabca7b8baec95bc70fe119cad201b35
      cb55944f
    • Giuseppe Ottaviano's avatar
      Implement contains() in sorted vector types · 1787a34a
      Giuseppe Ottaviano authored
      Summary: Implement C++20 `contains()`.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29556436
      
      fbshipit-source-id: 1c3a8e6e9a07f6e47491b063f88df6bf9da2d87b
      1787a34a
    • Pranjal Raihan's avatar
      Make ThreadLocalPtr::Accessor::Iterator default construct as singular · ee8d0a53
      Pranjal Raihan authored
      Reviewed By: yfeldblum, mshneer
      
      Differential Revision: D29578762
      
      fbshipit-source-id: 2aa9af8f68825997db8b71fc97dda5477fa57413
      ee8d0a53
    • Pranjal Raihan's avatar
      Make folly::detail::IteratorAdaptor default-constructible · af7df254
      Pranjal Raihan authored
      Summary:
      [Forward iterators are required to be default constructible](https://en.cppreference.com/w/cpp/named_req/ForwardIterator). So as of today, `IteratorAdaptor` is lying!
      
      We should default construct the wrapped iterator into a [singular iterator](https://eel.is/c++draft/iterator.requirements#general-7):
      > Iterators can also have singular values that are not associated with any sequence. Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value, the assignment of a non-singular value to an iterator that holds a singular value, and, for iterators that meet the `Cpp17DefaultConstructible` requirements, using a value-initialized iterator as the source of a copy or move operation.
      
      Reviewed By: yfeldblum, mshneer
      
      Differential Revision: D29578765
      
      fbshipit-source-id: 7f2f0762b17f0b1a056532fc5db2abdd76cca3ea
      af7df254
    • Pranjal Raihan's avatar
      Make RequestContext::StaticContext an aggregate struct instead of std::pair · 95f24e43
      Pranjal Raihan authored
      Summary: Let's give them better names than `first` and `second`.
      
      Reviewed By: yfeldblum, mshneer
      
      Differential Revision: D29536636
      
      fbshipit-source-id: bbb37f2e1c0c51dd1ce2a0cfeca85399f409c415
      95f24e43
    • Sergey Korytnikov's avatar
      Fix waiters signaling in BatchSemaphore · db6d0a10
      Sergey Korytnikov authored
      Summary: Fix BatchSemaphore to post baton for a multiple waiter in a list as long as there are token available.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29430799
      
      fbshipit-source-id: 98b0a616d0ce863108dcf331e491fd2cc12429d1
      db6d0a10
  3. 07 Jul, 2021 1 commit
    • Yedidya Feldblum's avatar
      expand the lock protocol and facilities · b1fa3c6f
      Yedidya Feldblum authored
      Summary:
      Define a single implementation type `lock_base` which handles all cases, including unique/shared/upgrade locks and including sans-state/with-state locks. Add `unique_lock_base`, `shared_lock_base`, and `upgrade_lock_base`.
      
      Revise `upgrade_lock` simply to derive `upgrade_lock_base`. We may use `upgrade_lock` as an example for specializing `unique_lock` and `shared_lock`
      
      Remove `ProxyLockableUniqueLock` since `unique_lock_base` absorbs it. Let the `unique_lock` specializations for `DistributedMutex` inherit `unique_lock_base` instead.
      
      Add lock invokers for every lock, try-lock, unlock, and lock-transition member. Were these used only internally to implement the lock-policy types and the lock-transition functions they might be left in detail but there may be broader use-cases for at least some of them. Putting the invokers in this header is consistent with proper placement since this header is intended to own all lock primitives and facilities.
      
      Revise the lock-transition functions to handle cases where the from and to lock types are sans-state/with-state. Expand the set of lock-transition functions for completeness: lock-transitions include x->s, x->u, u->s, u->x; while try-lock-transitions include s->x, s->u, u->x.
      
      Reviewed By: aary
      
      Differential Revision: D28767313
      
      fbshipit-source-id: 153adc8270f0f4338db6acf544b8d358556d6f49
      b1fa3c6f
  4. 06 Jul, 2021 2 commits
    • Durham Goode's avatar
      Enable fb dynamicconfig loading inside eden backingstore · 4215b920
      Durham Goode authored
      Summary:
      Reenables dynamicconfig loading with eden backingstore. Previously it
      broke edenfs-windows-release, but we believe the
      opensource/fbcode_builder/manifests/eden tweak has fixed it.
      
      Reviewed By: xavierd
      
      Differential Revision: D29561192
      
      fbshipit-source-id: 775dd21d177f3baa09b0192e7d3f7231008c3766
      4215b920
    • Shai Szulanski's avatar
      Initialize value in loadUnaligned · eac6011a
      Shai Szulanski authored
      Differential Revision: D29497151
      
      fbshipit-source-id: 64ad1adbd68d10066fc65ddc41e9cff5ef3c6b53
      eac6011a
  5. 01 Jul, 2021 2 commits
    • Xavier Deguillard's avatar
      win: add --return-nonzero-on-failures to sc_testpilot · 73484b8a
      Xavier Deguillard authored
      Summary:
      For whatever reason, sc_testpilot default to --return-zero-on-failures, which
      means that test failures are simply not reported properly to the signal box.
      Fix that by explicitely passing --return-nonzero-on-failures
      
      Reviewed By: fanzeyi
      
      Differential Revision: D29509158
      
      fbshipit-source-id: ef991f91df48e99769f96ffd8d7015cdf507c723
      73484b8a
    • Neel Goyal's avatar
      import or backport reinterpret_pointer_cast · d6610d60
      Neel Goyal authored
      Summary: Import or backport `std::reinterpret_pointer_cast` into folly. The relevant guard is the use of the C++17 language and, if feature-test macros are exported, `__cpp_lib_shared_ptr_arrays >= 201611`.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29433489
      
      fbshipit-source-id: 92596d05f5057cff4c65283711b4ed6778d20758
      d6610d60
  6. 30 Jun, 2021 7 commits
    • Shai Szulanski's avatar
      Regen github actions (#1614) · 1583ff06
      Shai Szulanski authored
      Summary: Pull Request resolved: https://github.com/facebook/folly/pull/1614
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29496725
      
      fbshipit-source-id: 7c726e3e310eb28cf33603ebd83df6d832488369
      1583ff06
    • Tom Jackson's avatar
      Fix Guard to respect early termination · 6bfd3879
      Tom Jackson authored
      Summary:
      `guard()` is ignoring the return from `handler()`, thereby breaking `take()`.
      
      (Note: this ignores all push blocking failures!)
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29416513
      
      fbshipit-source-id: 1e84d3ba53cb68a1664a349e71398c782b1df751
      6bfd3879
    • Andrew Smith's avatar
      Channels framework · 120926cd
      Andrew Smith authored
      Summary: A channel is a sender and receiver pair that allows one component to send values to another. A sender and receiver pair is similar to an AsyncPipe and AsyncGenerator pair. However, unlike AsyncPipe/AsyncGenerator, senders and receivers can be used by higher level transformation abstractions that are much more memory-efficient than using AsyncGenerator directly. These higher level abstractions do not require long-lived coroutine frames to wait on incoming values.
      
      Reviewed By: aary
      
      Differential Revision: D29158424
      
      fbshipit-source-id: 88c51d0f9d73677a04906197f4c44fe84ac01cdb
      120926cd
    • lorinlee's avatar
      Fix typo in AtomicSharedPtr (#1610) · af0a489d
      lorinlee authored
      Summary: Pull Request resolved: https://github.com/facebook/folly/pull/1610
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29434206
      
      Pulled By: Orvid
      
      fbshipit-source-id: e20d29d0572307e917f8673bb161e3af1a6c55c3
      af0a489d
    • Fred Qiu's avatar
      Enforce ALPN match when both client and server support ALPN - folly/openssl · 1f106643
      Fred Qiu authored
      Summary:
      Added options to enforce ALPN when both client and support support ALPN for
      folly/openssl.
      
      Reviewed By: knekritz
      
      Differential Revision: D29298491
      
      fbshipit-source-id: acdd6001fea89606e2438640a4434cc56454f1aa
      1f106643
    • Alex Snast's avatar
      Enable FixedString construction from a string_view · d7ba0791
      Alex Snast authored
      Summary: `FixedString` should be constructable from a `std::string_view`. This allows passing `std::string_view` as key to `.emplace` in heterogeneous maps where the key is a `FixedString`.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29456948
      
      fbshipit-source-id: 4d2e428c7de05c1dc34b50727f8107aec915b632
      d7ba0791
    • Yedidya Feldblum's avatar
      fix lock order inversion in SharedPromise interrupt handler setting exn · aa605f8e
      Yedidya Feldblum authored
      Summary:
      The approach taken to fixing the lock inversion is not to use locks. Instead, we take a page from the existing futures playbook and use an atomic acyclic finite state machine! In this case, we have a single atomic pointer which may own either a raised object of type `exception_wrapper` or an interrupt-handler. Both are at least 4-byte aligned (8-byte aligned on 64-bit architectures) so both have the bottom two bits free. We use the bottom two bits as the state machine to identify whether `0x0u` initial where nothing is owned, `0x1u` where an interrupt-handler is owned, `0x2u` where a raised object is owned, or `0x3u` terminal where both an interrupt-handler and a raised object have been set and the handler has been invoked on the object.
      
      As choices, we forbid repeated calls to set the interrupt handler but permit repeated calls to raise an object. Calls after the first to set an interrupt handler terminate while calls to raise an object after the first are ignored. Existing tests demonstrate raising twice so we may like to be cautious about breaking that behavior.
      
      Some semantics are changed. Raised objects and interrupt handlers are destroyed earlier than before: they are now destroyed immediately after invoking handlers on objects.
      
      The lock order inversion is observed by thread sanitizer (tsan) in the new shared-promise test.
      
      Differential Revision: D29250149
      
      fbshipit-source-id: 63e257d03cebbf8dba95a514b7e89680cae569a7
      aa605f8e
  7. 29 Jun, 2021 4 commits
    • Giuseppe Ottaviano's avatar
      Avoid a shared_ptr copy in ConcurrentSkipList::Skipper construction · 6a6e02c2
      Giuseppe Ottaviano authored
      Summary:
      Make `Skipper` accept the `shared_ptr` by value, like `Accessor` does, so we can avoid the copy.
      
      Also expose the accessor, so after we handed over the `shared_ptr` we can still retain access to the list.
      
      Reviewed By: philippv
      
      Differential Revision: D29464861
      
      fbshipit-source-id: 75e93f0e72c4b8a2635d25e5e7860f27aec2a0a5
      6a6e02c2
    • Xavier Deguillard's avatar
      testpilot: testpilot is broken on Sandcastle · 5b5e814e
      Xavier Deguillard authored
      Summary: Hack to use parexec and the par directly.
      
      Reviewed By: chadaustin
      
      Differential Revision: D29461444
      
      fbshipit-source-id: 9ac6c1aa43728782b8de0a71446109f7fd5dab1d
      5b5e814e
    • Yedidya Feldblum's avatar
      use faster impl in Singleton stack traces · ec15ad3d
      Yedidya Feldblum authored
      Summary: It's already not async-signal-safe so just use the fast version of getting the unsymbolized stack traces.
      
      Reviewed By: chadaustin
      
      Differential Revision: D29439597
      
      fbshipit-source-id: 5a727d6a1c37a1c29bce84d866bf4863774c5ff1
      ec15ad3d
    • Shai Szulanski's avatar
      Make folly::FixedString::hash constexpr · c03e671e
      Shai Szulanski authored
      Reviewed By: Mizuchi
      
      Differential Revision: D29329759
      
      fbshipit-source-id: 588aa9ce11db2c1aea51dcf96e0de3b50633fa29
      c03e671e
  8. 27 Jun, 2021 14 commits
    • Nicolas J. Treat's avatar
      Add computeChainCapacity function to IOBuf · f434460f
      Nicolas J. Treat authored
      Summary:
      This diff adds a computeChainCapacity method to IOBuf.
      
      This method will loop through each chain of an IOBuf and return the total capacity of all IOBuf chains. This method is used very similarly to computeChainDataLength(), except rather than returning total data length, it returns total capacity.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D28875393
      
      fbshipit-source-id: 7e5f4a93916fa6c30eef8f8ee8c346a4537302af
      f434460f
    • Jon Maltiel Swenson's avatar
      Only execute callback if snapshot data has changed · ea968c30
      Jon Maltiel Swenson authored
      Summary: If a callback associated with an observer depends on yet another observer, the callback may be called more than once per distinct snapshot of the associated observer. This diff fixes this problem.
      
      Reviewed By: andriigrynenko
      
      Differential Revision: D29313796
      
      fbshipit-source-id: 345bfa0a1ac9dfaee806be21256088d4667aef15
      ea968c30
    • Chad Austin's avatar
      allow LockFreeRingBuffer::Cursor comparison · f5fe4005
      Chad Austin authored
      Summary:
      To support "clearing" a ring-buffer without violating TurnSequencer's
      invariants, allow storing a Cursor and comparing them in a loop.
      
      Differential Revision: D29345250
      
      fbshipit-source-id: d1d3bbd0b38489690334bdd63a7f40d9f6bad6c6
      f5fe4005
    • Shai Szulanski's avatar
      Remove unnecessary atomic from folly::coro::collectAny · f29212f6
      Shai Szulanski authored
      Summary: The CancellationSource already provides this synchronization
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29368728
      
      fbshipit-source-id: 085467d1fcb70d6fc059174ec673f7825025c076
      f29212f6
    • Shai Szulanski's avatar
      Use SmallUnboundedQueue by default in AsyncPipe · 4011d18c
      Shai Szulanski authored
      Summary: The empty size of UnboundedQueue is hundreds of bytes, which is a problem for many users of AsyncPipe. Switch the default to a smaller queue, adding a template parameter so high-throughput users can switch back.
      
      Differential Revision: D29375068
      
      fbshipit-source-id: 9fb2dad81697eeb70d58929e6b9c1c64102aa4a8
      4011d18c
    • Shai Szulanski's avatar
      folly::coro::SmallUnboundedQueue · 772aab73
      Shai Szulanski authored
      Summary: A smaller version of coro::UnboundedQueue to make AsyncPipe usable in memory-constrained programs.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29375067
      
      fbshipit-source-id: 5d27e4514312d2748055be764af88c0688d46065
      772aab73
    • Andrii Grynenko's avatar
      Make cycle detection FATAL instead of throw and disable it in opt mode · 29bc878d
      Andrii Grynenko authored
      Summary: Cycle detection can be very expensive, so it's better to disable it in opt mode. Because of that we need to make sure we catch such cycles in dbg builds, so we have to replace exceptions with LOG(FATAL).
      
      Reviewed By: joshkirstein
      
      Differential Revision: D29367695
      
      fbshipit-source-id: 9c2038eb5b42f98f4ab997f963b6a131b8d26cf9
      29bc878d
    • Yedidya Feldblum's avatar
      update the Core fake layout for testing · 45666e3d
      Yedidya Feldblum authored
      Summary: There is a golden image of the `Core` layout so that tests will catch accidental increases of the `Core` size. Updated it to the latest `Core` layout.
      
      Differential Revision: D29251632
      
      fbshipit-source-id: d16086390548848f4302678e0b86d9841be1140b
      45666e3d
    • Brandon Schlinker's avatar
      Use is_pod, add <system_error> include for TcpInfo · 40d61f3f
      Brandon Schlinker authored
      Summary:
      - `std::is_pod_v` is only available in C++17, shifting to `std::is_pod`
      - `std::errc` needs the `system_error` header; missing header wasn't causing an error on most platforms because it is included indirectly
      
      Differential Revision: D29355681
      
      fbshipit-source-id: e035c3f4ffac9d2c6f0d8ec511f7e0ea8544ba80
      40d61f3f
    • Brandon Schlinker's avatar
      Remove dependency on FixedString.h from TcpInfo · c634f9c3
      Brandon Schlinker authored
      Summary: Remove dependency on `folly/FixedString.h`
      
      Differential Revision: D29317987
      
      fbshipit-source-id: dbce91f117776a1dcd966230d9eed616b2a95613
      c634f9c3
    • Rob Lyerly's avatar
      Add collectAnyNoDiscard() · 5dff2932
      Rob Lyerly authored
      Summary:
      D28945040 added `collectAny()` which early-returns when any of the SemiAwaitables produces a value (or exception).  There's the potential for discarded results, however - multiple SemiAwaitables can produce results depending on whether they're at a cancellation point and when cancellation is signaled.
      
      This diff adds a variant `collectAnyNoDiscard()` that signals cancellation when any SemiAwaitable finishes and returns *all* results that completed.  It produces a tuple of optional values from the SemiAwaitables (or `std::nullopt` if it was canceled).
      
      Reviewed By: iahs
      
      Differential Revision: D29240725
      
      fbshipit-source-id: 3e664339e8692cbb9114138a96345cf9f9d5cb0b
      5dff2932
    • Chad Austin's avatar
      move watchman includes into their own directory · 28858c2e
      Chad Austin authored
      Summary:
      Bring Watchman closer to the EdenFS file name convention by moving
      source files and includes into watchman/.
      
      Reviewed By: fanzeyi
      
      Differential Revision: D29242789
      
      fbshipit-source-id: 6e29a4a50e7202dbf6b603ccc7e4c8184afeb115
      28858c2e
    • Brandon Schlinker's avatar
      Move test utilities into TcpInfoTestUtil · 384b72ff
      Brandon Schlinker authored
      Summary: Move some of the test utilities into `TcpInfoTestUtil.h` to enable use by other components.
      
      Differential Revision: D29307490
      
      fbshipit-source-id: 978947ff57ed02e438addf6190a4ea9955596333
      384b72ff
    • Robin Cheng's avatar
      Fix concurrency issues in ConcurrentSkipList. · 6f4811ef
      Robin Cheng authored
      Summary:
      ## An Overview of ConcurrentSkipList Synchronization
      folly::ConcurrentSkipList, at a high level, is a normal skip list, except:
      * Access to the nodes' "next" pointers are atomic. (The implementation used release/consume, which this diff changes to release/acquire. It's not clear the original use for consume was correct, and consume is very complicated without any practical benefit, so we should just avoid it.)
      * All accesses (read/write) must go through an Accessor, which basically is nothing except an RAII object that calls addRef() on construction and releaseRef() on destruction.
      * Deleting a node will defer the deletion to a "recycler", which is just a vector of nodes to be recycled. When releaseRef() drops the refcount to zero, the nodes in the recycler are deleted.
      
      Intuitively speaking, when refcount drops to zero, it is safe to delete the nodes in the recycler because nobody holds any Accessors. It's a very simple way to manage the lifetime of the nodes, without having to worry about *which* nodes are accessed or to be deleted.
      
      However, this refcount/recycling behavior is very hard to get right when using atomics as the main synchronization mechanism. In the buggy implementation before this diff, I'll highlight three relevant parts:
      * To find an element in the skip list (either to fetch, insert, or delete), we start from the head node and skips by following successor pointers at the appropriate levels until we arrives at the node in question. Rough pseudocode:
        ```
        def find(val):
          node = head
          while read(node) < val:
            node = skip(node)  # read the node to find the successor at some level
          return node
        ```
      
      * To delete an element from the skip list, after finding the element, we modify the predecessor at each level by changing their successors to point to the deleted element's successors, and place the deleted element into the recycler.
        ```
        def delete(node):
          for level in range(...):
            node->pred->setSkip(node->succ)
          recycle(node)
      
        def recycle(node):
          lock(RECYCLER_LOCK)
          recycler.add(node)
          unlock(RECYCLER_LOCK)
        ```
      
      * releaseRef() and addRef():
        ```
        def releaseRef():
          if refcount > 1:
            refcount--
            return
      
          lock(RECYCLER_LOCK)
          if refcount > 1:
            refcount--
            return
          to_recycle, recycler = recycler, [] # swap out nodes to recycle
          unlock(RECYCLER_LOCK)
      
          for node in to_recycle:
            free(node)
          refcount--
      
        def addRef():
          recount++
        ```
      
      ## The Safety Argument
      The Accessor/deletion mechanism is safe if we can ensure the following:
      * If for a particular node X, a thread performs read(X) and a different thread performs free(X), the read(X) happens-before free(X).
      
      ### Problem 1: Relaxed decrement
      The buggy implementation used relaxed memory order when doing `refcount--`. Let's see why this is a problem.
      
      Let thread 1 be the one performing read(X) and let thread 2 be the one performing free(X). The intuitive argument is that free(X) can only happen after the refcount drops to zero, which cannot be true while read(X) is happening. The somewhat more formal argument is that read(X) happens before refcount-- in thread 1, which happens before refcount--, which happens before free(X). But because we use relaxed memory order, the two refcount-- operations do not synchronize.
      
      ### Problem 2: Relaxed increment
      The implementation also used relaxed memory order for addRef(). Normally, for a refcount, it is OK to use relaxed increments, but this refcount is different: the count can go back up once it reaches zero. When reusing refcounts this way, we can no longer use relaxed increment.
      
      To see why, suppose thread 2 performs the following in this order:
      ```
      setSkip(P, not X)  # step before deleting X; P is a predecessor at some level
      recycle(X)
      refcount-- # acq_rel, gets 0
      free(X)
      ```
      and thread 2 performs the following in this order:
      ```
      refcount++ # relaxed
      skip(P) # gets X
      read(X)
      ```
      See Appendix A below; it's possible for the refcount to reach 0, and thread 2 to get X when reading the successor from P. This means that free(X) might theoretically still race with read(X), as we failed to show that once we delete something from the skip list, another accessor can't possibly reach X again.
      
      This might feel like an unnecessary argument, but without this reasoning, we would not have found a problem even if we just modified releaseRef() to instead delete some random nodes from the list. That will certainly cause trouble when someone else later tries to read something.
      
      ### Problem 3: No release operation on refcount before freeing nodes
      This is much more subtle. Suppose thread 1 performs the following:
      ```
      refcount--  # end of some previous accessor
      refcount++  # begin of new accessor
      skip(P)
      ```
      and thread 2 does this:
      ```
      setSkip(P, not X)  # step before deleting X; P is a predecessor
      recycle(X)
      read(refcount) # gets 1, proceeds to lock
      lock(RECYCLER_LOCK)
      read(refcount) # gets 1  ***
      unlock(RECYCLER_LOCK)
      free(X)
      refcount--
      ```
      
      The interleaving to make this possible is:
      ```
      thread 1 refcount--
      thread 2 everything until free(X)
      thread 1 refcount++
      thread 2 free(X)
      thread 2 refcount--
      ```
      
      We wish to show that `setSkip(P, not X)` happens before `skip(P)`, because this will allow us to prove that `skip(P) != X` (otherwise, we would not be able to show that a subsequent read(X) in thread 1 happens before free(X) - it might legitimately not).
      
      The intuitive argument is that `setSkip(P, not X)` happened before we decrement the refcount, which happens before the increment of the refcount in thread 1, which happens before `skip(P)`. However, because thread 2 actually decremented refcount quite late, it might be the case that thread 1's `refcount++` happened before thread 2's `refcount--` (and the increment synchronized with its own decrement earlier). There's nothing else in the middle that provided a synchronizes-with relationship (in particular, the read(refcount) operations do not provide synchronization because those are *loads* - wrong direction!).
      
      ### Correct implementation
      In addition to using acq_rel memory order on all operations on refcount, this diff modifies releaseRef() like this:
      ```
      def releaseRef():
        if refcount > 1: # optimization
          refcount--
          return
      
        lock(RECYCLER_LOCK)
        if --refcount == 0:  # "GUARD"
          to_recycle, recycler = recycler, [] # swap out nodes to recycle
        unlock(RECYCLER_LOCK)
      
        for node in to_recycle:
          free(node)
      ```
      
      I'll use "GUARD" to denote the event that the --refcount within the lock *returns 0*. I'll use this for the correctness proof.
      
      ### Correct implementation proof
      The proof will still be to show that if thread 1 performs read(X) and thread 2 performs free(X), read(X) happens-before free(X).
      
      Proof: thread 1 must have grabbed an accessor while reading X, so its sequence of actions look like this:
      ```
      refcount++
      skip(P)  # gets X
      read(X)
      refcount--
      ```
      thread 2 performs:
      ```
      GUARD
      free(X)
      ```
      
      Now, all writes on refcount are RMW operations and they all use acq_rel ordering, so all the RMW operations on refcount form a total order where successive operations have a synchronizes-with relationship. We'll look at where GUARD might stand in this total order.
      
      * Case 1: GUARD is after refcount-- from thread 1 in the total order.
        * In this case, read(X) happens before refcount-- in thread 1, which happens before GUARD, which happens before free(X).
      * Case 2: GUARD is between refcount++ and refcount-- from thread 1 in the total order.
        * In this case, observe (by looking at the total ordering on refcount RMW) that we have at least two threads (1 and 2) that contribute 1 to the refcount, right before GUARD. In other words, GUARD could not possibly have returned 0, which is a contradiction.
      * Case 3: GUARD is before refcount++ from thread 1 in the total order.
        * Let `setSkip(P, not X)` be a predecessor write operation before X is added to the recycler (this can happen on any thread). We will prove in the below lemma that `setSkip(P, not X)` happens before GUARD. Once we have that, then `setSkip(P, not X)` happens before GUARD which happens before thread 1's refcount++ which happens before `skip(P)`, and that renders it impossible for `skip(P)` to return X, making it a contradiction.
      
      It remains to prove that `setSkip(P, not X)` happens before GUARD.
      
      In the thread that performs an `setSkip(P, not X)` operation, it subsequently performs `recycle(X)`, which adds X to the recycler within RECYCLER_LOCK. In thread 2, GUARD happens within the RECYCLER_LOCK, and the subsequent swapping of the recycler vector contained X (which is shown by the fact that we free(X) after GUARD), so the lock must have been grabbed *after* the lock that added X to the recycler. In other words, we have the relationship that `setSkip(P, not X)` happens before `recycler.add(X)` which happens before `unlock(RECYCLER_LOCK)` which happens before `lock(RECYCLER_LOCK)` which happens before GUARD.
      
      Note that just like the original implementation, the optimization on top of releaseRef() is not a perfect optimization; it may delay the deletion of otherwise safe-to-delete nodes. However, that does not affect our correctness argument because it's always at least as safe to *delay* deletions (this hand-wavy argument is not part of the proof).
      
      ## Appendix A
      Consider the following two threads:
      ```
      std::atomic<int> x{0}, y{1};
      // Thread 1:
      x.store(1, std::memory_order_release);  // A
      int y1 = y.fetch_add(-1, std::memory_order_acq_rel);  // B
      
      // Thread 2:
      y.fetch_add(1, std::memory_order_relaxed);  // C
      int x2 = x.load(std::memory_order_acquire);  // D
      ```
      Intuitively, if y1 = 1, then thread 1's fetch_add was executed first, so thread 2 should get x2 = 1. Otherwise, if thread 2's fetch_add happened first, then y1 = 2, and x2 could be either 0 or 1.
      
      But, could it happen that y1 = 1 and x2 = 0? Let's look at the happens-before relationships between these operations. For intra-thread (sequenced-before), we have A < B and C < D (I'm using < to denote happens-before). Now, for inter-thread (synchronizes-with), the only pair that could establish a synchronizes-with relationship is A and D. (B and C is not eligible because C uses relaxed ordering.) For y1 = 1 and x2 = 0 to happen, we must not have D read the result of A, so it must be the case that they A does not synchronize-with D. But that's as far as we can go; there's nothing that really enforces much of an ordering between the two threads.
      
      We can also think about this in terms of reordering of memory operations. Thread 1 is pretty safe from reordering because of the acq_release, but in thread 2, an "acquire" ordering typically means no memory operations after D may be reordered before D, but it doesn't prevent C from being reordered after D. C itself does not prevent reordering due to it being an relaxed operation. So if thread 2 executed D and then C, then it would be trivially possible to have y1 = 1 and x2 = 0.
      
      The point of this is to highlight that just by having a release/acquire pair does not magically *order* them. The pair merely provides a synchronizes-with relationship *if* the read happens to obtain the value written by the write, but not any guarantees of *which* value would be read.
      
      ## Appendix B
      Problem 1 is detected by TSAN, but problems 2 and 3 are not. Why?
      
      TSAN detects data races by deriving synchronization relationships from the *actual* interleaving of atomics at runtime. If the actual interleaving would always happen but is not *guaranteed* by the standard, there may be a real undetected data race.
      
      For example, it is well known that the following code will be detected by TSAN as a data race on int "x":
      ```
      int x = 1;
      std::atomic<bool> y{false};
      
      // Thread 1
      x = 2;
      y.store(true, memory_order_relaxed);
      
      // Thread 2
      while (!y.load(memory_order_acquire)) {  // acquire is useless because writer used relaxed
      }
      std::cout << x << std::endl;
      ```
      TSAN reports a data race on x because `y` failed to provide proper synchronizes-with relationship between the two threads due to incorrect memory ordering. However, when compiling on x86, most likely we will end up with a binary that always guarantees the intuitively desired behavior anyway.
      
      So now consider the following code:
      ```
      std::atomic<int> x = 1;
      std::atomic<bool> y{false};
      int z = 8;
      
      // Thread 1
      z = 9;
      x.store(2, memory_order_release);
      y.store(true, memory_order_relaxed);
      
      // Thread 2
      while (!y.load(memory_order_acquire)) {  // acquire is useless because writer used relaxed
      }
      x.load(memory_order_acquire);
      std::cout << z << std::endl;
      ```
      There is a data race on the access to z, because the happens-before chain of `write z -> x.store -> y.store -> y.load -> x.load -> read z` is broken on the `y.store -> y.load` link. However, TSAN will not report a data race, because it sees the chain as `write z -> x.store -> x.load -> read z`. It sees x.store as synchronizing with x.load because it *observed* that the x.load obtained the value written by the x.write *at runtime*, so it inferred that it was valid synchronization. This isn't guaranteed, though, because it's possible in some execution (in theory) that x.load does not get the value written by x.store (similar to Appendix A).
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29248955
      
      fbshipit-source-id: 2a3c9379c7c3a6469183df64582ca9cf763c0890
      6f4811ef