This is a high performance implementation of Phillipe Flajolet’s HLL sketch but with significantly improved error behavior.
Iterates within a given Memory extracting integer pairs.
This performs union operations for all HllSketches.
Specifies the target type of HLL sketch to be created.
If the ONLY use case for sketching is counting uniques and merging, the HLL sketch is the highest performing in terms of accuracy for space consumed. For large counts, this HLL version will be 2 to 16 times smaller for the same accuracy than the Theta Sketches.
HLL sketches do not retain any of the hash values of the associated unique identifiers, so if there is any anticipation of a future need to leverage associations with these retained hash values, Theta Sketches would be a better choice.
HLL sketches cannot be intermixed or merged in any way with Theta Sketches.
Copyright © 2015–2020 The Apache Software Foundation. All rights reserved.