__folly_memcpy: allow overlapping buffers of any size and provide drop-in replacement for memmove
Summary: `__folly_memcpy` already behaved as `memmove` for n <= 256B. For n > 256B and overlapping buffers `__folly_memcpy` did some work - it determined the size is large enough for 128B AVX loads/stores - it has already copied the first 128B and the last 128B in YMM registers but then discarded it and fell back on `memmove`. Instead of wasting this work, forward copy (dst < src) or backward copy (dst > src). - use unaligned loads + aligned stores, but not non-temporal stores - for dst < src forward copy in 128 byte batches: -- unaligned load the first 32 bytes & last 4 x 32 bytes -- forward copy (unaligned load + aligned stores) 4 x 32 bytes at a time -- unaligned store the first 32 bytes & last 4 x 32 bytes - for dst > src backward copy in 128 byte batches: -- unaligned load the first 4 x 32 bytes & last 32 bytes -- backward copy (unaligned load + aligned stores) 4 x 32 bytes at a time -- unaligned store the first 4 x 32 bytes & last 32 bytes Reviewed By: yfeldblum Differential Revision: D31915389 fbshipit-source-id: 2c0197b2bddc102a7fb8f70a6f43e79ac994dc73
Showing
Please register or sign in to comment