Releases: jermp/sshash
Version 3.0.0
This release of the library features a restructured public API for the dictionary and its supported queries.
- "Advanced" lookup queries now include, besides the usual absolute
kmer_id
: contig information (contig_id
andcontig_size
of the contig where the k-mer lies in), the relative (within the contig) identifier of the k-mer (namedkmer_id_in_contig
), and the orientation of the k-mer in the contig. For any positive query,0 <= kmer_id_in_contig < contig_size
holds true. - Streaming queries are now general, not just streaming membership queries as they were in the previous releases, and return advanced lookup information by default.
- Support for Navigational queries has been added. Given a k-mer g[1..k], a navigational query determines if g[2..k]+x is present (forward neighbourhood) and if x+g[1..k-1] is present (backward neighbourhood) in the dictionary, for x = A, C, G, T ('+' here means string concatenation).
If a contig identifier is specified for a navigational query (rather than a k-mer), then the backward neighbourhood of the first k-mer and the forward neighbourhood of the last k-mer in the contig are returned.
Version 2.1.0
With this release the dictionary construction uses external memory to save RAM usage.
Version 2.0.0
No major changes compared to previous version (rather than renaming of variables for consistency with papers), but we removed a (useless) serialised 4-byte integer from skew_index
and so previous index binary files are not compatible with this library release.
Version 1.2.0
This release adds a new tool called permute
that re-orders (and possibly reverse-complement) the strings in an input (weighted) collection to minimize the number of runs in the abundances and, hence, optimize the encoding of the abundances.
The abundances are encoded in O(r)
space on top of the space for a SSHash dictionary, where r
is the number of runs (i.e., maximal substrings formed by a single abundance value) in the abundances.
The i
-th abundance in the sequence, corresponding to the k-mer of identifier i
, is retrieved in O(log r)
time.
Version 1.1.0
This release adds a new feature: compressed abundances.
The SSHash dictionary now can also store the abundances in highly compressed space.
Version 1.0.0
First release.