-
Andre Nash authored
Summary: This introduces `__folly_memcpy`, which is an implementation of `memcpy` that uses prefetch to speed up cold copies (data absent from L1) and uses overlapping copies to avoid as much branching as possible to speed up hot copies (data present in L1). A description of the core ideas for this memcpy is in the file comment at the top of folly_memcpy.S. `__folly_memcpy` *does* act as a `memmove`, although it isn't optimized for that purpose for copies of 257 or more bytes. This masks some undefined behavior bugs when code calls `memcpy` on an overlapping region of data. `perf` samples will show when `memmove` is called by `__folly_memcpy`, which will help identify these undefined behavior bugs for copies of 257 bytes or more. Reviewed By: yfeldblum Differential Revision: D23629205 fbshipit-source-id: 61ed66122cc8edf33154ea6e8b87f4223c0ffcc0
40233942