Table of Contents
## The KMV Sketch, Better Estimator, Size = *k*

Now lets choose *k = 3*, which means that we will keep the 3 smallest hash values that the cache has seen. The fractional distance that these *k* values consume is simply the value of the k^{th} hash value, or *V(k ^{th})*, which in this example is 0.195. This is also known as the

We want not only a more accurate estimate, but one that is also __ unbiased__. I’m going to skip a few pages of calculus[1] and reveal that we only need to subtract one in the numerator to achieve that. Our new, unbiased KMV estimator becomes

This is much closer to 10, but with such a small sample size we were also lucky.

Note that with our new estimator based on the *k* minimum values in the cache we don’t have to keep any hash values larger than *V(k ^{th})*. And, since

[1] For those interested in the mathematical proofs, Giroire has a straightforward and easy-to-follow development.