native aarch64 support for F14 hash tables

Summary: This diff adds support for the F14 algorithm using the advanced SIMD instructions (NEON) that are part of aarch64. Reviewed By: yfeldblum Differential Revision: D7765273 fbshipit-source-id: 866a8a3481ad60b8aadcfb39718d6a5e62bbe07c

native aarch64 support for F14 hash tables
Summary: This diff adds support for the F14 algorithm using the advanced SIMD instructions (NEON) that are part of aarch64. Reviewed By: yfeldblum Differential Revision: D7765273 fbshipit-source-id: 866a8a3481ad60b8aadcfb39718d6a5e62bbe07c
19c316ae · Nathan Bronson · Facebook Github Bot · 25f4df21 · 19c316ae · 19c316ae
Commit 19c316ae authored May 01, 2018 by Nathan Bronson Committed by Facebook Github Bot May 01, 2018
5 changed files
--- a/folly/container/F14.md
+++ b/folly/container/F14.md
@@ -2,11 +2,12 @@

 F14 is a 14-way probing hash table that resolves collisions by double
 hashing.  Up to 14 keys are stored in a chunk at a single hash table
-position.  SSE2 vector instructions are used to filter within a chunk;
-intra-chunk search takes only a handful of instructions.  **F14** refers
-to the fact that the algorithm **F**ilters up to **14** keys at a time.
-This strategy allows the hash table to be operated at a high maximum
-load factor (12/14) while still keeping probe chains very short.
+position.  Vector instructions (SSE2 on x86_64, NEON on aarch64)
+are used to filter within a chunk; intra-chunk search takes only a
+handful of instructions.  **F14** refers to the fact that the algorithm
+**F**ilters up to **14** keys at a time.  This strategy allows the hash
+table to be operated at a high maximum load factor (12/14) while still
+keeping probe chains very short.

 F14 provides compelling replacements for most of the hash tables we use in
 production at Facebook.  Switching to it can improve memory efficiency
@@ -157,10 +158,12 @@ unlikely to perform any key comparisons, successful searches are likely
 to perform exactly 1 comparison, and all of the resulting branches are
 pretty predictable.

-The vector search uses SSE2 intrinsics.  SSE2 is a non-optional part
-of the x86_64 platform, so every 64-bit x86 platform supports them.
-AARCH64's vector instructions will allow a similar strategy, although
-the lack of a movemask operation complicates things a bit.
+The vector search is coded using SIMD intrinsics, SSE2 on x86_64 and
+NEON on aarch64.  These instructions are a non-optional part of those
+platforms (unlike later SIMD instruction sets like AVX2 or SVE), so no
+special compilation flags are required.  The exact vector operations
+performed differs between x86_64 and aarch64 because aarch64 lacks a
+movemask instruction, but the F14 algorithm is the same.

 ## WHAT ABOUT MEMORY OVERHEAD FOR SMALL TABLES?


--- a/folly/container/detail/F14IntrinsicsAvailability.h
+++ b/folly/container/detail/F14IntrinsicsAvailability.h
@@ -19,9 +19,9 @@
 #include <folly/Portability.h>

 // clang-format off
-// F14 is only available on x86 with SSE2 intrinsics (so far)
+// F14 has been implemented for SSE2 and AARCH64 NEON (so far)
 #ifndef FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE
-# if FOLLY_SSE >= 2
+# if FOLLY_SSE >= 2 || FOLLY_AARCH64
 #  define FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE 1
 # else
 #  define FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE 0

--- a/folly/container/detail/F14Policy.h
+++ b/folly/container/detail/F14Policy.h
@@ -89,7 +89,7 @@ struct BasePolicy
    return IsAvalanchingHasher<Hasher, Key>::value;
  }

-  using Chunk = SSE2Chunk<Item>;
+  using Chunk = F14Chunk<Item>;
  using ChunkPtr = typename std::pointer_traits<
      typename AllocTraits::pointer>::template rebind<Chunk>;
  using ItemIter = F14ItemIter<ChunkPtr>;
@@ -240,7 +240,7 @@ class BaseIter : public std::iterator<
                     ValuePtr,
                     decltype(*std::declval<ValuePtr>())> {
 protected:
-  using Chunk = SSE2Chunk<Item>;
+  using Chunk = F14Chunk<Item>;
  using ChunkPtr =
      typename std::pointer_traits<ValuePtr>::template rebind<Chunk>;
  using ItemIter = F14ItemIter<ChunkPtr>;

--- a/folly/container/detail/F14Table.cpp
+++ b/folly/container/detail/F14Table.cpp
@@ -24,7 +24,7 @@ namespace folly {
 namespace f14 {
 namespace detail {

-__m128i kEmptyTagVector = {};
+TagVector kEmptyTagVector = {};

 } // namespace detail
 } // namespace f14

--- a/folly/container/detail/F14Table.h
+++ b/folly/container/detail/F14Table.h