A zero-allocation, space-efficient, logarithmic-time solution in Go to the
exact membership query problem, dejavu
is a Go module that aims to provide an
elegant way of answering the following yes-no question: "Is element x a
member of the set S?"
Use dejavu
when the probabilistic nature of approximate membership query
structures does not satisfy application requirements.
package main
import (
"fmt"
"net"
"github.com/circular-time/dejavu"
)
func main() {
var (
cache *dejavu.Cache = dejavu.NewCache128(1 << 22) // 4,194,304 elements
seen bool
value = []byte(net.IPv6loopback)
)
fmt.Println(
cache.Size(),
)
// 92274688
// (88 MiB)
seen, _ = cache.Recall(value)
fmt.Println(seen)
// false
cache.Insert(value)
seen, _ = cache.Recall(value)
fmt.Println(seen)
// true
fmt.Println(
cache.Length(),
)
// 1
}
dejavu
requires cached values to be of fixed length. Users working with data
of varying sizes should consider caching not those values, but their hashes
instead.
Furthermore, it stores data in a binary tree that is not self-balancing. Values that naturally occur in lexicographical order, such as time-based UUIDs, should be hashed (or have their bytes rearranged so that the random bits come first) before caching for optimal performance.
Deletion of inserted values is not currently supported! To overcome this limitation, consider using two or more Caches in rotation (as in log rotation).
dejavu.Cache
occupies exactly n (log k + 2 log n) bits of memory to
store n elements out of a set of k possibilities. For example, it would
take 88 MiB to maintain a cache of more than four million IPv6 addresses.
The additional 2 log n per element above the theoretical baseline of log k is incurred to maintain a binary tree structure, as a trade-off against time complexity.
The Go runtime has its own overheads that add to total memory consumption.
cache.Insert()
and cache.Recall()
are operations in a binary tree averaging
an equivalent time complexity of O(log n).
In the worst-case scenario where values are cached in ascending/descending order, the binary tree would end up a linked list with complexity O(n), in which case it would be more space-efficient to use a regular list of complexity Ω(n log k).
goos: linux
goarch: arm64
pkg: github.com/circular-time/dejavu
BenchmarkCacheInsert
BenchmarkCacheInsert-2 2290011 739.7 ns/op 0 B/op 0 allocs/op
BenchmarkCacheRecall
BenchmarkCacheRecall-2 2246784 759.0 ns/op 0 B/op 0 allocs/op
Experiments suggest that dejavu.Cache
is roughly two times more
memory-efficient than a Go map[[16]byte]struct{}
with the same number of
keys, and somewhere in the region of ten times so compared to an LMDB instance
holding the same number of records consisting of 128-bit keys and 0-bit values.
package main
import (
"testing"
"github.com/circular-time/dejavu"
)
const (
n = 1 << 22 // 2^22 = 4,194,304
)
func BenchmarkDejavu(b *testing.B) {
var (
i int
)
for i = 0; i < b.N; i++ {
dejavu.NewCache128(n)
}
}
func BenchmarkMap(b *testing.B) {
var (
i int
)
for i = 0; i < b.N; i++ {
_ = make(map[[16]byte]struct{}, n)
}
}
BenchmarkDejavu
BenchmarkDejavu-2 1105 1475508 ns/op 92274688 B/op 1 allocs/op
BenchmarkMap
BenchmarkMap-2 592 2558122 ns/op 160432152 B/op 2 allocs/op