Up to 25% performance improvements for Skylake based platforms
Summary: Implemented 8 times unrolled linear folding to cover for _mm_clmulepi64_si128 latency on Skylake Reviewed By: terrelln Differential Revision: D30376340 fbshipit-source-id: 7828639c135ba51048b60c621f5427c7ac1938b4
Showing
Please register or sign in to comment