Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual Question #9

Open
mrjjwright opened this issue Mar 29, 2015 · 1 comment
Open

Conceptual Question #9

mrjjwright opened this issue Mar 29, 2015 · 1 comment

Comments

@mrjjwright
Copy link

Hi,

I am looking for an efficient persistent immutable key value store structure to use for an application I am building for the Mac. As the user change any key value in my app I want to create a new immutable record that links back to the previous one. I don't want to rely too much on language level persistent data structures (e.g. as found in Clojure or Immutable.js) but want to work solely with data, with a guarantee of everything being persisted efficiently on disk. I like the looks of discodb (I would have to create an Objective-C wrapper around it) and am trying to grok it more. I didn't find anything in the documentation about mutating data. Is it ok and efficient to create a new version of the database each time, with I assume some naming convention for older versions?

@bauman
Copy link

bauman commented Jun 9, 2015

You will pay a considerable data IO penalty and possible CPU penalty by doing so.

Data is stored randomly in the blob,. Creating a new blob using data from the old blob will require a full copy of the old blob, which will likely induce a random seek around the first blob. At minumum, you'll need to memory map the whole blob each time.

Depending on where you place the wrapper, you will be doing a lot of type casting if you build wrappers. The current python wrapper performs a cstring to python string conversion for every key and every value upon read. That is a substantially computationally intensive task compared to other tasks in the library.

A full copy (using the release build) would cast blob>cstr>String>cstr>blob for every key and and every value. you may incur additional string copies in your application depending on how you pass strings around. Python passes strings by value by default, so a python application likely has 2 additional string copies.

Point of the story, everything will be extremely efficient on random key access, but full blob copies will be slow. Only you know how often you will be doing either operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants