DataSketches is an open source, high-performance library of stochastic streaming algorithms commonly called “sketches” in the data sciences. Sketches are small, stateful programs that process massive data as a stream and can provide approximate answers, with mathematical guarantees, to computationally difficult queries orders-of-magnitude faster than traditional, exact methods.
After 8 years of development and 5 years as in Open Source, we have begun the important migration from a stand-alone GitHub site to being a member of the Apache Software Foundation community. While we undergo this migration, we beg your patience.
Please continue to use DataSketches.GitHub.io for all overview documentation and access to online-javadocs for the time being.
Please continue to use our Google-groups forum or the GitHub-issues on the specific repositories to bring issues or questions to our attention.
Please continue to use the Maven Central GroupID = “com.yahoo.datasketches” to locate current and past release Jars until we have formal releases under Apache.
The datasketches.apache.org website will be a placeholder until we have migrated our current community website from DataSketches.GitHub.io. For detailed project information, please continue to visit DataSketches.GitHub.io.
As the repositories under GitHub.io/DataSketches migrate they will disapear from the the GitHub.com/DataSketches organization page. Please refer to this list be directed to the proper locations.
Disclaimer: Apache DataSketches is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.