Quantile Digest Functions¶
Quantile digest and T-digest are two
older algorithms for estimating rank-based metrics. While T-digest generally has better
performance and better accuracy at the tails,
quantile digest supports more numeric types (bigint, double, real) and
provides a maximum rank error guarantee, which ensures relative uniformity of precision
along the quantiles. Quantile digests are also formally proven to support lossless merges,
while T-digest is not (though it does empirically demonstrate lossless merges).
Velox uses the modern KLL sketch algorithm for the approx_percentile function, which
provides stronger accuracy guarantees than both quantile digest and T-digest.
The quantile digest functions documented here exist primarily to support
pre-existing workloads that have data stored using the quantile
digest format for backward compatibility.
Data Structures¶
A quantile digest is a data sketch which stores approximate percentile
information. The Velox type for this data structure is called qdigest,
and it takes a parameter which must be one of bigint, double or
real which represent the set of numbers that may be ingested by the
qdigest. They may be merged without losing precision, and for storage
and retrieval they may be cast to/from VARBINARY.
In the function signatures below, T represents the parameterized type of the qdigest,
which can be bigint, double, or real.
Functions¶
- merge(qdigest<T>) qdigest<T>¶
Merges all input
qdigests into a singleqdigest.
- qdigest_agg(x: T) qdigest<T>¶
Returns the
qdigestwhich summarizes the approximate distribution of all input values ofx.
- qdigest_agg(x: T, w: bigint) qdigest<T>
Returns the
qdigestwhich summarizes the approximate distribution of all input values ofxusing the per-item weightw.
- qdigest_agg(x: T, w: bigint, accuracy: double) qdigest<T>
Returns the
qdigestwhich summarizes the approximate distribution of all input values ofxusing the per-item weightwand maximum error ofaccuracy.accuracymust be a value greater than zero and less than one, and it must be constant for all input rows.
- value_at_quantile(digest: qdigest<T>, quantile: double) T¶
Returns the approximate percentile values from the quantile digest
digestgiven thequantile. Thequantilemust be between zero and one (inclusive).
- values_at_quantiles(digest: qdigest<T>, quantiles: array<double>) array<T>¶
Returns the approximate percentile values as an array from the quantile digest
digestat each of the specified quantiles given in thequantilesarray. All quantile values must be between zero and one (inclusive).