Java implementation of Succinct's core algorithms. This library provides the core algorithms for Succinct as described in the NSDI'15 paper.
This library has no external requirements.
To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:
<dependency>
<groupId>amplab</groupId>
<artifactId>succinct-core</artifactId>
<version>0.1.8</version>
</dependency>The Succinct-Core library exposes Succinct in three layers:
SuccinctCore
SuccinctFile
SuccinctIndexedFile
SuccinctCore exposes the basic construction primitive for all internal
internal data-structures, along with accessors to the core data-structures
(e.g., NPA, SA and ISA, which are termed as NextCharIdx, Input2AOS and AOS2Input
in the paper).
An implementation of the same is at SuccinctBuffer.
SuccinctFile builds on top of SuccinctCore and exposes the interface for
three main functionalities:
byte[] extract(int offset, int length)
long[] search(byte[] query)
long count(byte[] query)
These primitives allow random access (extract) and search (count, search)
directly on the compressed representation of flat-file (i.e., unstructured)
data. SuccinctFileBuffer
is a ByteBuffer based implementation of SuccinctFile. Look at this
example to
see how SuccinctFileBuffer can be used.
Finally, SuccinctIndexedFile builds on the functionality of both SuccinctCore
and SuccinctFile to expose a record buffer, i.e., a collection of records.
This interface finds app;ications in the Succinct on Apache Spark interfaces,
particularly in SuccinctRDD
and SuccinctTableRDD
implementations.
We provide an example
program that outlines the usage of count, search and extract
functionalities of the SuccinctFile. A convenient script is included in the
bin/ directory to run the example. The usage of the script is as follows:
./bin/succinct-shell <file-name>
where filename is the name of the file being analyzed.