Commit 19c316ae authored by Nathan Bronson's avatar Nathan Bronson Committed by Facebook Github Bot

native aarch64 support for F14 hash tables

Summary:
This diff adds support for the F14 algorithm using the advanced
SIMD instructions (NEON) that are part of aarch64.

Reviewed By: yfeldblum

Differential Revision: D7765273

fbshipit-source-id: 866a8a3481ad60b8aadcfb39718d6a5e62bbe07c
parent 25f4df21
......@@ -2,11 +2,12 @@
F14 is a 14-way probing hash table that resolves collisions by double
hashing. Up to 14 keys are stored in a chunk at a single hash table
position. SSE2 vector instructions are used to filter within a chunk;
intra-chunk search takes only a handful of instructions. **F14** refers
to the fact that the algorithm **F**ilters up to **14** keys at a time.
This strategy allows the hash table to be operated at a high maximum
load factor (12/14) while still keeping probe chains very short.
position. Vector instructions (SSE2 on x86_64, NEON on aarch64)
are used to filter within a chunk; intra-chunk search takes only a
handful of instructions. **F14** refers to the fact that the algorithm
**F**ilters up to **14** keys at a time. This strategy allows the hash
table to be operated at a high maximum load factor (12/14) while still
keeping probe chains very short.
F14 provides compelling replacements for most of the hash tables we use in
production at Facebook. Switching to it can improve memory efficiency
......@@ -157,10 +158,12 @@ unlikely to perform any key comparisons, successful searches are likely
to perform exactly 1 comparison, and all of the resulting branches are
pretty predictable.
The vector search uses SSE2 intrinsics. SSE2 is a non-optional part
of the x86_64 platform, so every 64-bit x86 platform supports them.
AARCH64's vector instructions will allow a similar strategy, although
the lack of a movemask operation complicates things a bit.
The vector search is coded using SIMD intrinsics, SSE2 on x86_64 and
NEON on aarch64. These instructions are a non-optional part of those
platforms (unlike later SIMD instruction sets like AVX2 or SVE), so no
special compilation flags are required. The exact vector operations
performed differs between x86_64 and aarch64 because aarch64 lacks a
movemask instruction, but the F14 algorithm is the same.
## WHAT ABOUT MEMORY OVERHEAD FOR SMALL TABLES?
......
......@@ -19,9 +19,9 @@
#include <folly/Portability.h>
// clang-format off
// F14 is only available on x86 with SSE2 intrinsics (so far)
// F14 has been implemented for SSE2 and AARCH64 NEON (so far)
#ifndef FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE
# if FOLLY_SSE >= 2
# if FOLLY_SSE >= 2 || FOLLY_AARCH64
# define FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE 1
# else
# define FOLLY_F14_VECTOR_INTRINSICS_AVAILABLE 0
......
......@@ -89,7 +89,7 @@ struct BasePolicy
return IsAvalanchingHasher<Hasher, Key>::value;
}
using Chunk = SSE2Chunk<Item>;
using Chunk = F14Chunk<Item>;
using ChunkPtr = typename std::pointer_traits<
typename AllocTraits::pointer>::template rebind<Chunk>;
using ItemIter = F14ItemIter<ChunkPtr>;
......@@ -240,7 +240,7 @@ class BaseIter : public std::iterator<
ValuePtr,
decltype(*std::declval<ValuePtr>())> {
protected:
using Chunk = SSE2Chunk<Item>;
using Chunk = F14Chunk<Item>;
using ChunkPtr =
typename std::pointer_traits<ValuePtr>::template rebind<Chunk>;
using ItemIter = F14ItemIter<ChunkPtr>;
......
......@@ -24,7 +24,7 @@ namespace folly {
namespace f14 {
namespace detail {
__m128i kEmptyTagVector = {};
TagVector kEmptyTagVector = {};
} // namespace detail
} // namespace f14
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment