Change loop order to maximize cache hit
It is better to write to memory that are grouped together to reduce probability of cache miss. This simple change improved the function's execution speed by 70%. Removed the unnecessary memset for the buffer.
Showing
Please register or sign in to comment