1. 22 Jul, 2021 2 commits
    • Giuseppe Ottaviano's avatar
      Use CoreCachedSharedPtr in Singleton · 937fc980
      Giuseppe Ottaviano authored
      Summary: `CoreCachedSharedPtr` is almost as fast as `ReadMostlySharedPtr`, so we can use it to have a better default that does not have pathological behavior under heavy contention. `try_get_fast()` is still useful if we need to squeeze out the last cycle.
      
      Reviewed By: philippv, luciang
      
      Differential Revision: D29812053
      
      fbshipit-source-id: 49e9e53444f8099dbfe13e36c3c07c1b57bb89fb
      937fc980
    • Lucian Grijincu's avatar
      thrift: varint: BMI2 (pdep) based varint encoding: branchless 2-5x faster than loop unrolled · 4baba282
      Lucian Grijincu authored
      Summary:
      BMI2 (`pdep`) varint encoding that's mostly branchless. It's 2-5x faster than the current loop-unrolled version.
      
      Being mostly branchless there's less variability in micro-benchmark runtime compared to the loop-unrolled version:
      - the loop-unrolled versions are slowest when encoding random numbers across the entire 64-bit range (some likely large) and branch prediction has most failures.
      
      Kept the fast-pass for values <127 (encoded in 1 byte) which are likely to be frequent. I couldn't find a fully branchless version that performed better anyway.
      
      TLDR:
      - `u8`: unroll the two possible values (1B and 2B encoding). Faster in micro-benchmarks than branchless versions I tried (needed more instructions to produce the same value without branches).
      - `u16` & `u32`:
      -- u16 encodes in up to 3B, u32 in up to 5B.
      -- Use `pdep` to encode into a u64 (8 bytes). Write 8 bytes to `QueueAppender`, but keep track of only the bytes that had to be written. This is faster than appending a buffer of bytes using &u64 and size.
      -- u16 could be written by encoding using `_pdep_u32` (3 bytes max fit in u32) and using smaller 16B lookup tables. In micro-benchmark that's not faster than using the same code as the one to encode u32 using `_pdep_u64`. In prod will perform better due to sharing the same lookup tables with u32 and u64 versions (less d-cache pressure).
      - `u64`: needs up to 10B. `pdep` to encode first 8B and unconditionally write last 2B too (but keep track of `QueueAppender` size properly).
      
      Reviewed By: vitaut
      
      Differential Revision: D29250074
      
      fbshipit-source-id: 1f6a266f45248fcbea30a62ed347564589cb3348
      4baba282
  2. 21 Jul, 2021 6 commits
    • Samuel Miller's avatar
      Factor ticket key manager into handler interface · dd7d175a
      Samuel Miller authored
      Summary:
      I've created an abstract class called `OpenSSLTicketHandler` to which the
      server's ssl context dispatches ticket crypto preparation. This means that
      changing the configuration of how tickets are encrypted can be changed by
      using a different implementation of the handler.
      
      This abstraction is sort of broken by all the set and get APIs on the context
      that modify ticket secrets, rather than properly abstracting this detail into
      a concrete impl of the handler (e.g. a `TLSTicketKeyManager` can manage secrets
      from a file and not require users to pass all these secrets down from the
      acceptors). For now though we rely on (checked!) dynamic casts to get a
      `TLSTicketKeyManager` from which we can freely modify the secrets.
      
      Reviewed By: mingtaoy
      
      Differential Revision: D24686664
      
      fbshipit-source-id: fb30941982fb3114e2aba531372a9d35ccc0ee48
      dd7d175a
    • Felix Handte's avatar
      Allow JemallocHugePageAllocator to Grow · d4241c98
      Felix Handte authored
      Summary:
      This diff switches the JemallocHugePageAllocator to permit initially reserving
      a (very) large contiguous region of virtual address space, which can then be
      progressively backed by huge pages on demand.
      
      This is intended to increase the flexibility of the allocator. In particular,
      this should enable automatic initialization of the JHA without having to
      carefully and correctly size the arena. (Assumption: `mmap()`'ing a big chunk
      of address is cheap, right? Although even if it is this is probably something
      we should only do for long-lived processes. Identifying "long-lived processes"
      and triggering initialization in them is itself an open topic to be addressed
      in a subsequent diff.)
      
      The concern here is that this potentially moves `madvise(..., MADV_HUGEPAGE)`
      calls later into the process lifetime, when memory pressure/fragmentation may
      be greater. This might induce stalls in the process? This can be mitigated
      using the existing pattern of explicitly calling `::init()` in your processes'
      `main()`. This will commit the requested pages.
      
      Context: I intend to use this allocator to back allocations in the Zstd
      Compression Context Singletons.
      (`folly/compression/CompressionContextPoolSingletons.cpp`)
      
      Feedback on the approach taken here is greatly appreciated!
      
      Reviewed By: davidtgoldblatt
      
      Differential Revision: D29502147
      
      fbshipit-source-id: 814e1ba3544cf5b5cfb67a08abd18f940255362f
      d4241c98
    • Giuseppe Ottaviano's avatar
      Add CoreCachedWeakPtr::lock() method, improve benchmarks · 9acfd80d
      Giuseppe Ottaviano authored
      Summary:
      Currently `CoreCachedWeakPtr` only exposes `get()`, which returns by value, so to lock the pointer we need to do two refcount operations, one on the weak count and one on the shared count. This is expensive.
      
      We could return by `const&`, but I don't want to expose the internal state, as we may be able to optimize the footprint by relying on implementation details of `std::shared/weak_ptr` in the future.
      
      Instead, expose a natural `lock()` method.
      
      Also, improve the benchmarks:
      - Add comparison with `ReadMostlySharedPtr`
      - Ensure that all threads are busy for the same time, so that wall time * `numThreads` is a good approximation of overall CPU time.
      
      Reviewed By: philippv
      
      Differential Revision: D29762995
      
      fbshipit-source-id: 851a82111e2726425e16d65729ec3fdd21981738
      9acfd80d
    • Michael Stella's avatar
      Remove unnecessary string copy in JSON serialization · 6696e55c
      Michael Stella authored
      Summary:
      `asString()` returns a string by value, which means a copy of the string must be made.
      
      We don't actually need to do this at all:
      - We know that the dynamic contains a string, thanks to the switch statement
      - The function being called with the result, `escapeString`, only wants a StringPiece anyway.
      
      So this copy is a waste. And uses significant CPU in my application.
      
      Reviewed By: ispeters
      
      Differential Revision: D29812197
      
      fbshipit-source-id: 60df668f7501f78f4282717d6896cd891950b6f5
      6696e55c
    • Dan Oprescu's avatar
      CancellableAsyncScope pass through the correct returnAddress · cdb7a478
      Dan Oprescu authored
      Reviewed By: capickett
      
      Differential Revision: D29764496
      
      fbshipit-source-id: 855bcfc749358d3754b604e417cf5512a91ab6df
      cdb7a478
    • Giuseppe Ottaviano's avatar
      Use small_vector::copyInlineTrivial only if the storage is small · f623e994
      Giuseppe Ottaviano authored
      Reviewed By: Liang-Dong
      
      Differential Revision: D29784019
      
      fbshipit-source-id: f7516603bc73b33ad7c57da1103451f75f8566b4
      f623e994
  3. 20 Jul, 2021 6 commits
  4. 16 Jul, 2021 4 commits
    • Mihnea Olteanu's avatar
      Fix stub of sockets for EMSCRIPTEN and XROS · b8fdbc94
      Mihnea Olteanu authored
      Summary:
      The current implementation of function stubs in `SocketFileDescriptorMap.cpp` generates the following build errors:
      ```
      stderr: xplat/folly/net/detail/SocketFileDescriptorMap.cpp:171:3: error: 'socketToFd' has a non-throwing exception specification but can still throw [-Werror,-Wexceptions]
        throw std::logic_error("Not implemented!");
        ^
      xplat/folly/net/detail/SocketFileDescriptorMap.cpp:170:30: note: function declared non-throwing here
      int SocketFileDescriptorMap::socketToFd(void* sock) noexcept {
      ```
      because the methods are stubbed out to throw and exception even though they are marked as `noexcept`.
      
      To fix the warning the subbing implementation is changed to call `std::terminate()` instead of throwing an exception. According to the language specification (https://en.cppreference.com/w/cpp/language/noexcept_spec) this should not result in any change in run-time behavior, since throwing and exception in a method marked as `noexcept` is effectively a call to `std::terminate()`.
      
      Differential Revision: D29687674
      
      fbshipit-source-id: 77405d8a31e45c8836e8746c9b25e12ef06335c4
      b8fdbc94
    • Xintong Hu's avatar
      Add API to set cmsg for write · 653703a3
      Xintong Hu authored
      Summary: allow users to set/append a list of cmsgs to be sent for each write
      
      Reviewed By: bschlinker
      
      Differential Revision: D29313594
      
      fbshipit-source-id: 8f78c59ecfe56ddb2c8c016d6105a676cd501c18
      653703a3
    • Alex Zhu's avatar
      Add AsyncSSLSocket::setSupportedProtocols · ca7ce442
      Alex Zhu authored
      Summary: This diff adds AsyncSSLSocket::setSupportedProtocols, analagous to SSL_set_alpn_protos, which allows connection specific ALPNs to be set. Prior to this diff, there was no easy way to change the set of ALPNs to use other than creating a separate SSLContext or manually using the low level OpenSSL interface.
      
      Reviewed By: mingtaoy
      
      Differential Revision: D29247716
      
      fbshipit-source-id: 6f378b4fc75f404e06fe0131ab520b7e4b8f33b6
      ca7ce442
    • Dan Melnic's avatar
      Add support for Subprocess to call sched_setaffinity · 2f7fdc20
      Dan Melnic authored
      Summary: Add support for Subprocess to call sched_setaffinity
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29722725
      
      fbshipit-source-id: b0d4577bf3caaeb65137c8168bd27e1f402969da
      2f7fdc20
  5. 15 Jul, 2021 3 commits
    • Shai Szulanski's avatar
      Reorder definitions in AsyncGenerator.h · d26d241b
      Shai Szulanski authored
      Summary: Prepares for next diff
      
      Reviewed By: Mizuchi
      
      Differential Revision: D29665910
      
      fbshipit-source-id: 1026b0836ca803d566086ab9ed8e13e36d607c5f
      d26d241b
    • Philip Pronin's avatar
      fix semantics of QMS::Iterator::skipTo · bdf37414
      Philip Pronin authored
      Summary: Passing large `key` doesn't correctly advance position to the end.
      
      Reviewed By: ot
      
      Differential Revision: D29712973
      
      fbshipit-source-id: 7da7c49250753c12f3703ccf49107e56bf841131
      bdf37414
    • Aaryaman Sagar's avatar
      Have collect() handle the case of a not-ready future · ea91c9bc
      Aaryaman Sagar authored
      Summary:
      If one of the input futures is off the end of a folly::Executor::weakRef()
      executor, then there is a chance that it may never complete with a value or an
      ecception.  In this case, collect() would crash because it assumes that the
      folly::Try instances for all input futures have either a value or an exception.
      
      Fix that case by injecting a BrokenPromise exception for the case where a future
      never has an exception or a value.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D26989091
      
      fbshipit-source-id: b810fe4d5d071233da1f453b3759991e057d78c6
      ea91c9bc
  6. 14 Jul, 2021 1 commit
    • Pranjal Raihan's avatar
      Remove unused UniqueInstance::PtrRange · 74f3c043
      Pranjal Raihan authored
      Summary: `UniqueInstance.cpp` has its own `PtrRange` which it exclusively uses.
      
      Reviewed By: yfeldblum, Mizuchi
      
      Differential Revision: D29685352
      
      fbshipit-source-id: 32658b3ee6fc1830c2c2f27693baefa16026f13e
      74f3c043
  7. 13 Jul, 2021 3 commits
    • Geoff Oakham's avatar
      TokenBucketStorage primitive extracted from BasicDynamicTokenBucket · 0f00cc10
      Geoff Oakham authored
      Summary:
      Extract the underlying primitive that powers BasicDynamicTokenBucket, as TokenBucketStroage. This abstraction speaks to "tokens" (instead of time) and allows for arbitrary borrowing.  It also handles thread safety.
      
      This allows other bucket token based libraries to be created (eg. a Cooperative Scheduler) using the same primitive.
      
      API compatibility has been maintained with the existing classes.
      
      Reviewed By: yfeldblum, rohithmenon
      
      Differential Revision: D29522354
      
      fbshipit-source-id: 010bd9caf3f9c3726621e87d23e5c564773424b7
      0f00cc10
    • Pranjal Raihan's avatar
      Back out "Don't use typeid without RTTI in UniqueInstance" · 7a18d182
      Pranjal Raihan authored
      Reviewed By: prajay
      
      Differential Revision: D29673416
      
      fbshipit-source-id: f08fa89db74198aa648e871837f1f7b841cd8d6b
      7a18d182
    • Pranjal Raihan's avatar
      Back out "RequestContext::StaticContextAccessor" · 436e7fd6
      Pranjal Raihan authored
      Reviewed By: mshneer
      
      Differential Revision: D29673244
      
      fbshipit-source-id: 62676c498a60c720d463703f5f1e24b421683a85
      436e7fd6
  8. 12 Jul, 2021 4 commits
  9. 09 Jul, 2021 6 commits
    • Mingtao Yang's avatar
      Move FindZstd.cmake into fbcode_builder · 42707737
      Mingtao Yang authored
      Summary:
      A bunch of existing projects vendor it into their own CMake folder. But these
      same projects also have fbcode_builder's CMake directory in their CMAKE_MODULE_PATH,
      so move FindZstd here.
      
      Differential Revision: D29637686
      
      fbshipit-source-id: 805676e18f98ef217dea8511d039edc38771b529
      42707737
    • Fred Emmott's avatar
      Mark zstd as a dependency of fizz · a405d073
      Fred Emmott authored
      Summary:
      I don't understand why we have both manifests/specs; I was assuming one was generated from the other,
      but I don't see a at-generated and grep doesn't show up anything, so both are manual edits.
      
      Reviewed By: iahs
      
      Differential Revision: D29524000
      
      fbshipit-source-id: 5f6df62f0162ea24a9903bedf5d220eab5c2dff1
      a405d073
    • Zsolt Dollenstein's avatar
      Opt in opensource/fbcode_builder to pyfmt · 40803d69
      Zsolt Dollenstein authored
      Reviewed By: zertosh
      
      Differential Revision: D29612107
      
      fbshipit-source-id: ac450058134e23a3831db35d2e49c80eb8cde36a
      40803d69
    • Dong Li's avatar
      Revert D29536635: RequestContext::StaticContextAccessor · c2ea3761
      Dong Li authored
      Differential Revision:
      D29536635 (https://github.com/facebook/folly/commit/bc0818f89e7ab3802cdac95a25e29f4061021c3c)
      
      Original commit changeset: 008a612ec65a
      
      fbshipit-source-id: 2c18348732f1b1f8022e3cc8dce5a5684ce16108
      c2ea3761
    • Tim Berning's avatar
      fix flaky test · 52c78089
      Tim Berning authored
      Summary: AsyncSocketSendmmsgIntegrationTest.PingPongRequest fails during stress runs with multiple jobs due to the 5ms timeout in `UDPClient::sendPing()`. Increasing the timeout fixes the issue.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29623870
      
      fbshipit-source-id: ad6ee0356001fbded9ce01ddc099b708d1477899
      52c78089
    • Pranjal Raihan's avatar
      RequestContext::StaticContextAccessor · bc0818f8
      Pranjal Raihan authored
      Summary: `RequestContext::StaticContextAccessor` acts as a guard, preventing all threads with a `StaticContext` from being destroyed (or created).
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29536635
      
      fbshipit-source-id: 008a612ec65aa7095450f4d0ab522e8ecd864548
      bc0818f8
  10. 08 Jul, 2021 5 commits
    • Giuseppe Ottaviano's avatar
      Optimize small_vector copy and move for inline storage · cb36f3c8
      Giuseppe Ottaviano authored
      Summary:
      Codegen for copy and move assignment is suboptimal when the vectors involved are inline, in particular if the destination is default-constructed (which is very common) and even more so if the value type is trivially copyable.
      
      The main optimization is that if the storage is inline and the type is trivially-copyable, we can just copy the whole storage, regardless of the size of the container. This avoids a branchy loop, and it's usually just a handful of MOVs.
      
      While at it, optimized all the related code paths by avoiding delegating to `swap()` and `assign()` when possible, which introduce further branches that are hard to optimize. Also move and copy in place when the vector is non-empty, avoiding unnecessary destructions.
      
      Also fixed a couple of bugs:
      - The move constructor did not clear its argument when inline, inconsistently with the move assignment and `std::vector`
      - The move assignment operator passed its heap storage to the argument instead of freeing it.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29592519
      
      fbshipit-source-id: 6281cdda775568d28619afea8b7ca2fb168c7e5d
      cb36f3c8
    • Yedidya Feldblum's avatar
      cut Tearable · c226674c
      Yedidya Feldblum authored
      Summary: It is unused. An upcoming Concurrency TS will have a replacement.
      
      Reviewed By: mshneer
      
      Differential Revision: D29597486
      
      fbshipit-source-id: a108b945ce32eb17cedefad89630c9171fc5c9c2
      c226674c
    • Fred Emmott's avatar
      Make `travis_docker_build.sh` macos-compatible · cb55944f
      Fred Emmott authored
      Summary:
      BSD readlink doesn't have -f.
      
      I'm not using TravisCI, however docker is still convenient for reproducing builds locally.
      
      Reviewed By: yns88
      
      Differential Revision: D29523333
      
      fbshipit-source-id: e01169f3eabca7b8baec95bc70fe119cad201b35
      cb55944f
    • Giuseppe Ottaviano's avatar
      Implement contains() in sorted vector types · 1787a34a
      Giuseppe Ottaviano authored
      Summary: Implement C++20 `contains()`.
      
      Reviewed By: yfeldblum
      
      Differential Revision: D29556436
      
      fbshipit-source-id: 1c3a8e6e9a07f6e47491b063f88df6bf9da2d87b
      1787a34a
    • Pranjal Raihan's avatar
      Make ThreadLocalPtr::Accessor::Iterator default construct as singular · ee8d0a53
      Pranjal Raihan authored
      Reviewed By: yfeldblum, mshneer
      
      Differential Revision: D29578762
      
      fbshipit-source-id: 2aa9af8f68825997db8b71fc97dda5477fa57413
      ee8d0a53