Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage handles 1: Introduce ChunkStoreHandle #7917

Closed
wants to merge 3 commits into from
Closed

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Oct 28, 2024

This introduces ChunkStoreHandle, a Send+Sync+Clone handle to a ChunkStore.

Shoe-horning a sharable, thread-safe, read-write handle into an existing codebase that A) makes heavy use of work stealing and B) carries its handles both in temporary contexts, and self, and function parameters is pretty scary -- it's a recipe for non-deterministic deadlocks and nasty race conditions.

That said:

  • We know we have to do this for various reasons (query streaming, parallel ingestion, etc), so we have to start somewhere.
  • The codebase until now was borrowck compliant, which means there's very little chance to screw things up at the moment -- it'll get scary as things evolve into a more parallel-heavy design from there.

But yes, we have a lot of refactoring work ahead of us to get this in a sane place. One step at a time...
This work will probably happen as part of #4298.

NOTE: This PR does not update tests, so they will fail to compile (all libs and binaries work). That's on purpose: the next PR will break all these same tests again with the introduction of the QueryCacheHandle, fixing them now will be a pure waste of time.
I still want this to be its own PR though, that is complex enough as is.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself do-not-merge Do not merge this PR include in changelog labels Oct 28, 2024
@@ -162,11 +162,13 @@ impl QueryHandle<'_> {
fn init_(&self) -> QueryHandleState {
re_tracing::profile_scope!("init");

let store = self.engine.store.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about coarseness here

@@ -350,7 +345,10 @@ impl EntityDb {
}

pub fn add_chunk(&mut self, chunk: &Arc<Chunk>) -> Result<Vec<ChunkStoreEvent>, Error> {
let store_events = self.data_store.insert_chunk(chunk)?;
// NOTE: using `write_arc` so the borrow checker can make sense of the partial borrow of `self`.
let mut store = self.data_store.write_arc();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about write-coarseness here

@@ -433,15 +430,18 @@ impl EntityDb {
fn gc(&mut self, gc_options: &GarbageCollectionOptions) -> Vec<ChunkStoreEvent> {
re_tracing::profile_function!();

let (store_events, stats_diff) = self.data_store.gc(gc_options);
// NOTE: using `write_arc` so the borrow checker can make sense of the partial borrow of `self`.
let mut store = self.data_store.write_arc();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about write-coarseness here

let store_events = self.data_store.drop_time_range(timeline, drop_range);
self.on_store_deletions(&store_events);
// NOTE: using `write_arc` so the borrow checker can make sense of the partial borrow of `self`.
let mut store = self.data_store.write_arc();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about write-coarseness here

@@ -467,9 +470,12 @@ impl EntityDb {
pub fn drop_entity_path(&mut self, entity_path: &EntityPath) {
re_tracing::profile_function!();

let store_events = self.data_store.drop_entity_path(entity_path);
// NOTE: using `write_arc` so the borrow checker can make sense of the partial borrow of `self`.
let mut store = self.data_store.write_arc();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about write-coarseness here

@@ -514,6 +520,8 @@ impl EntityDb {
) -> impl Iterator<Item = ChunkResult<LogMsg>> + '_ {
re_tracing::profile_function!();

let store = self.data_store.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about read-coarseness here

@@ -606,7 +614,8 @@ impl EntityDb {
});
}

for chunk in self.store().iter_chunks() {
let store = self.data_store.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bring this at the top and add note about read-coarseness here

query: &LatestAtQuery,
entity_path: &EntityPath,
component_names: impl IntoIterator<Item = ComponentName>,
) -> LatestAtResults {
re_tracing::profile_function!(entity_path.to_string());

let store = self.store.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about read-coarseness here

query: &RangeQuery,
entity_path: &EntityPath,
component_names: impl IntoIterator<Item = ComponentName>,
) -> RangeResults {
re_tracing::profile_function!(entity_path.to_string());

let store = self.store.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about read-coarseness here

@@ -47,6 +47,8 @@ impl<'a> DataUi for ComponentPathLatestAtResults<'a> {
UiLayout::SelectionPanelLimitHeight | UiLayout::SelectionPanelFull => num_instances,
};

let store = db.store().read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about read-coarseness here

let results =
db.query_caches()
.latest_at(db.store(), query, entity_path, [*component_name]);
let results = db
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get these two at the top of the else block at the very least, and add note about read-coarseness here

@@ -173,6 +174,7 @@ fn component_list_ui(
let component_path = ComponentPath::new(entity_path.clone(), component_name);
let is_static = db
.store()
.read()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely bring that at top-level and add note about read-coarseness here

@@ -337,15 +338,17 @@ fn entity_tree_stats_ui(
" (excluding subtree)"
};

let store = db.store().read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that goes into the else block, or we need to modify all these subtree_stats method to explicitly take a store handle.

Either way we need a note regarding recursive locking.

@@ -380,7 +383,7 @@ fn entity_tree_stats_ui(

if 0 < timeline_stats.total_size_bytes && 1 < num_temporal_rows {
// Try to estimate data-rate:
if let Some(time_range) = db.store().entity_time_range(timeline, &tree.path) {
if let Some(time_range) = store.entity_time_range(timeline, &tree.path) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. Modifying the substree_stats methods seems like the better approach then.

@@ -707,12 +707,11 @@ fn visit_relevant_chunks(
) {
re_tracing::profile_function!();

let store = db.store().read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note about read coarseness

@@ -751,35 +751,32 @@ impl TimePanel {
);
}

let store = entity_db.store().read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note about read coarseness here.

Also these show_ methods should probably take handles as parameters... but we cannot patch everything in the world in this PR. Maybe a TODO.

Comment on lines +619 to +620
// NOTE: Do NOT try and `read_arc()` the store here: `purge_fraction_of_ram` is going to
// access it mutably in all sorts of ways, it's a guaranteed deadlock.
Copy link
Member Author

@teh-cmc teh-cmc Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good example of what I mean by "carries its handles both in temporary contexts, and self, and function parameters". Thankfully that one is fairly obvious -- the hundreds that we don't know about are the real scary ones.

@@ -804,15 +806,15 @@ impl StoreHub {
.and_then(|blueprint_id| self.store_bundle.get(blueprint_id));

let blueprint_stats = blueprint
.map(|entity_db| entity_db.store().stats())
.map(|entity_db| entity_db.store().read().stats())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ones definitely deserved to be brought to the top with a read coarseness notice...

@@ -262,6 +261,7 @@ impl SpaceViewBlueprint {
store_context.blueprint_timepoint_for_writes(),
blueprint
.store()
.read()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely has to move upstairs + notice

@@ -381,6 +381,8 @@ impl DataQueryPropertyResolver<'_> {
recursive_property_overrides: &IntMap<ComponentName, OverridePath>,
handle: DataResultHandle,
) {
let blueprint_store = blueprint.store().read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coarseness notice

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 28, 2024

Even for a quick shoehorning, this really leaves a lot to be desired. Let me implement the QueryCacheHandle, and see if they are some low hanging fruits from there that could help make this a bit more saner.

@teh-cmc teh-cmc marked this pull request as draft October 28, 2024 14:47
@teh-cmc teh-cmc changed the title Introduce ChunkStoreHandle Storage handles 1: Introduce ChunkStoreHandle Oct 28, 2024
@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 28, 2024

Even for a quick shoehorning, this really leaves a lot to be desired. Let me implement the QueryCacheHandle, and see if they are some low hanging fruits from there that could help make this a bit more saner.

It looks like I'm close to having a half-decent design that keeps the pitfalls at bay, and basically gives us the best of both worlds.
Should have a PR tomorrow.

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 29, 2024

This turned out to be way too brittle (unsurprisingly). Proper PR with safe handles coming soon ™️.

@teh-cmc teh-cmc closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR include in changelog ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant