Commit 80c05933 authored by Zino Benaissa's avatar Zino Benaissa Committed by Facebook GitHub Bot

heap_vector_map, heap_vector_set

Summary:
Define `heap_vector_map` and `heap_vector_set`. These feature a fast lookup (find operation) which does not rely on `std::lower_bound` or similar to find keys. Map elements are not laid out in sorted order. Instead, they are laid out in heap order, which is also known as Eytzinger order. A heap order layout is optimal to maximize memory and cache locality of lookup operations.

A heap vector set containing the elements 0-7 lays out the elements as:

                                     elements  in sorted order
  heap_container[0]  =  4 <-- middle
  heap_container[1]  =  2
  heap_container[2]  =  6
  heap_container[3]  =  1
  heap_container[4]  =  3
  heap_container[5]  =  5
  heap_container[6]  =  7 <-- max
  heap_container[7]  =  0 <-- min

`heap_vector_map` (referred below as HM) is almost a drop-in replacements for `sorted_vector_map` (referred as SM). All SM APIs are supported, and are semantically equal, with very few exceptions.

Although HM works correctly for any key value SM supports, the speedup of find operation (random key search) is typically limited to maps with small key types with comparison operations subject to inlining. Measurements suggest up to 2x speedup of find operation for HM compared to SM.

The key tradeoff is that HM has slower insertion and removal operations and slower iteration. Prefer HM if updates and traversals operations are rare, and lookups are the dominant operation. Of course, SM itself has slow insertion and removal operations so, if these operations are sufficiently common, another map type entirely would be preferable.

Key similarities:

1) They have the same in-situ sizes: `sizeof(HM) == sizeof(SM)`.
2) Insertions and removals. As expected, these operations are slower for HM. Both SM and HM need to move elements while preserving sorted or heap order but the SM operation can be much simpler.
3) A fast construction of HM from a sorted vector or from a SM.
4) An iterator that follows sorted order compatible with SM. This is complex, though. When sorted iteration order is not required, it is faster to use the enclosed container's iteration order.

Key differences:

1) iterate() is a new API that provides the iterator range of the container vector. iterate() enables the fastest traversal of the heap container elements. For example,
          for (auto& element : HS.iterator()) { // traversed as laid out in memory
              std::cout << e << ", ";
           }
for above examples, the loop prints:  4, 2, 6, 1, 3, 5, 7, 0,
           for (auto& element : HS) { // key-sorted
              std::cout << e << ", ";
           }
for above examples, the loop prints:   0, 1, 2, 3, 4, 5, 6, 7,

2) data() is purposely not provided because it does not point to the first elements. If the start address is needed, use instead HM.iterate().data()

Reviewed By: Gownta

Differential Revision: D32128733

fbshipit-source-id: 1df7372720b969ee7a84004ded101db132e0c224
parent c5d1dc7b
......@@ -567,6 +567,7 @@ if (BUILD_TESTS)
TEST f14_fwd_test SOURCES F14FwdTest.cpp
TEST f14_map_test SOURCES F14MapTest.cpp
TEST f14_set_test WINDOWS_DISABLED SOURCES F14SetTest.cpp
TEST heap_vector_types_test SOURCES heap_vector_types_test.cpp
TEST foreach_test SOURCES ForeachTest.cpp
TEST merge_test SOURCES MergeTest.cpp
TEST sparse_byte_set_test SOURCES SparseByteSetTest.cpp
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment