fix bad codegen by widening needle
Summary: On architectures with SSE2 but not AVX2, the _mm_set1_epi8 intrinsic at the core of F14Table::findImpl expands to multiple instructions. One of those is a MOVD of either 4 or 8 byte width. Only the bottom byte of that move actually affects the result, but if a 1-byte needle has been spilled then this will be a 4 byte load. GCC 5.5 has been observed to reload (or perhaps fuse a reload and a narrow) needle using a MOVZX with a 1 byte load in parallel to the MOVD. This combination causes a failure of store-to-load forwarding, which has a big performance penalty (60 nanoseconds per find on a microbenchmark). Keeping needle >= 4 bytes avoids the problem and also happens to result in slightly more compact code. Reviewed By: yfeldblum Differential Revision: D9149727 fbshipit-source-id: 9e957207c23914da317e763eb944bf7dd43a5c51
Showing
Please register or sign in to comment