-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover file store previous state at open #1632
Closed
nymius
wants to merge
3
commits into
bitcoindevkit:master
from
nymius:I1517/append-after-open-causes-overwrites
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing some background but what's the point of doing this unless you return the data you read?
I think a PR here makes sense because we got rid of the "load" system we had before. Maybe this needs a big re-think including deleting lots of the API here. We probably just want
append_changeset
andcreate
andload
as the API. Does anyone need to actually iter changesets?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the FileStore has APIs that users likely don't need.
IMO, implementing these api's in the following fashion could make
FileStore
much simpler.Key APIs:
append_changeset:
We should ensure that duplicate changesets are not stored. Currently, users must call
aggregate_changesets
to remove duplicates, but the store isn't updated with the final unique changeset. Instead, we can makeappend_changeset
automatically append the given changeset, aggregate them to remove duplicates, and then store the final aggregated changeset back into theFileStore
.load:
As suggested by @ValuedMammal, it makes sense to open the
FileStore
inappend
mode rather thanwrite
mode. This will place the file pointer at the end of the file, avoiding overwriting and making it easier to append new changesets.Other Considerations:
aggregate_changesets:
While I'm not certain if downstream users will need this API, it seems valuable to keep it exposed for now.
Iteration Over Changesets:
From a user's perspective, iterating changesets is unnecessary. Currently, it's only used to collect changesets for aggregation. We could either make it private or move the iteration logic directly into
aggregate_changesets
.By handling changeset duplication within append_changeset, we can eliminate the need for users to manually aggregate changesets or iterate over them..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your vision.
In this PR I proposed a solution of compromise guided by the principle of stabilizing the API rather than breaking it. I don't know if that's currently something BDK wants to do.
For example, I discarded the
append
solution because there seemed to be an expected behavior hidden behind the interaction of two functions:open
not moving the file pointer to the EOF when accessing the file and,append_changeset
setting the file pointer (and length) to the end of the last changeset written.The interaction of both is what made the
append_changeset_truncates_invalid_bytes
test to pass in the original code, because 1 leaves the file pointer at the end of theMAGIC_BYTES
and 2 removes the invalid bytes completely by shorting the file to the end of the last written changeset.That's also why I don't provide the user with the well encoded changesets on error, as that would have implied a more complex structure, using generics (to consider the changeset).
I traded it with the minimum data to locate and isolate the failure in the file.
You may find a clearer idea of this in the
fail_to_reopen_if_write_is_short
test, at this line.If there is a common agreement to make the refactoring above instead of trying to fix the current API I can close this PR and try to implement the changes in another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we want to ensure duplicate changesets are not stored. They do no harm and only happen in the case of some kind of programmer error.
There's no issue with stabalizing the API of this crate I think. We can do what we want with it. To me at least, this
reopen
makes it make even less sense because it reads in every entry but doesn't return it which is a waste.I agree with @nymius that opening the file with append is a nice idea in theory but it's actually counter productive here. There's no magic to opening files with append it just makes the operating system handle what we can do better.
I vote for a PR cutting the API down to:
create
-- renamecreate_new
.load
-- does whatopen
does and aggregates the changeset and leaves the file pointer at the right spot. Errors if the file didn't exist or wrong magic bytes.append
-- does whatappend_changeset
doescreate_or_load
-- does whatopen_or_create_new
doesThen updating the README to explain that this is the simplest implementation of a bdk compatible database. It's useful for development but should probably not be used beyond that. No effort has been made to make it backwards compatible so you have to do that yourself. Link them to docs on sqlite feature for what they should use in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
load
should essentially returnResult<(C, Self), Error>
so that we can get rid of.aggregate_changeset
. This forces the caller to only read once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the point.
On ensuring this -> we can get rid of iterating over the changeset each time user want to get the total changeset.
Though in my proposed impl -> we have to iterate at the end -> so doesn't make much diference.
Though I have no idea regarding this but if that's the case -> then it does not make sense to try to keep unique set of changesets.
Combining both the
load/open
andaggregate_changeset
logic does make sense because IMO, both are correlated to each other as in order to load a wallet , a user has to get the aggregated changeset for which they have to load thestore
instance.By this, we don't require to create another public api and users don't have to manually call a function to get the required changeset for loading wallet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If loading gets too slow, we can just have a method to merge changesets. I.e. overwrite the file with just one entry, the new aggregate changeset. Or aggregate x number of changesets at the end.
It does NOT make sense to have a check that stops duplicate changesets being stored. How are we going to implement this? Have an aggregate changeset of all persisted changes in memory that ensures the new changeset is not a subset of it? That typically doesn't happen, and if it does, it means that there is a problem with the in-memory data structures. The in-memory data structures should only output a changeset because there are changes to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found an issue with this approach. The
(C, Self)
return type is hard to adapt to theWalletPersister::initialize
trait.By the documentation I don't have available any file path to load the store from and I must return all data stored in the
persister
(i.e. return theAggregatedChangeset
):file_store
's trait implementation currently in master assumespersister
(i.e.file_store::Store
) has been opened or loaded before reaching the trait method (look howaggregate_changesets()
is called withoutcreate_new/open
ceremony).Summing up, the constraints for
file_store::Store
are:WalletPersister::initialize
becausefile_path
is not at scope at that momentfile_path
parameter inWalletPersister::initialize
will break API and force changes onrusqlite
implementation.Following LLForun's idea, I propose the following new API for
file_store::Store
:create
: rename ofcreate_new
load
: open file, place file pointer at EOF and returnfile_store::Store
.dump
: aggregate changesets and return the aggregated changeset.append
: rename ofappend_changeset
load_or_create
:Store::create
orStore::load
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on this I realized we can have both,
(C, Store)
as return type ofStore::load
and aStore::dump
method to call on<bdk_file_store::Store<ChangeSet> as WalletPersister>::initialize
, so I will keep both.