TDigest for estimating quantiles
Summary: The existing histogram approach for estimating quantiles is inadequate for common use cases. This diff introduces TDigests for estimating quantiles of streaming values. The current histogram approach suffers from two issues: - The user needs to know something about the universe of values, U, and the underlying distribution in order to get reasonably accurate values - Because the histogram is not biased, estimating tail quantiles such as p99.9 requires a large amount of memory. t-digests (https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) are biased digests that allow for accurate estimates of tail quantiles, while using a small amount of memory and without requiring the user to know anything about U. The scaling function in the paper is slow, because it uses arcsin. This approach uses a sqrt function instead. Details of why this works in the comments of the TDigest file. Reviewed By: simpkins Differential Revision: D7512432 fbshipit-source-id: 01460433ec7380fed88e5e942ef15f032dd794bb
Showing
folly/stats/TDigest.cpp
0 → 100644
folly/stats/TDigest.h
0 → 100644
Please register or sign in to comment