Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: only issue new dependency tasks for "parent" documents which contain a pinned relation(list) to "child" document #577

Open
sandreae opened this issue Oct 11, 2023 · 1 comment

Comments

@sandreae
Copy link
Member

sandreae commented Oct 11, 2023

In our dependency task we need to account for cases when a newly reduced document may have skipped over a view which is pinned from another document (this happens when operations are batched). In this case we want to issue new dependency tasks for any of these "parent" documents (the document which contained the pinned relation). We currently do this by issuing dependency tasks for all documents where their schema contains a pinned relation to the schema of the newly reduced document. We could improve this a bit by identifying documents which actually contain the pinned relation we are looking for. This can be done by querying operation values which are of type pinned_relation or pinned_relation_list where the value matches any child_document_view_id for the current document.

It's really similar to what we do when identifying if a blob should be garbage collected (basically, it has no "parent" documents) here:

/// Helper for getting the document ids of any document which relates to the specified document.
///
/// Optionally pass in a `SchemaId` to restrict the results to documents of a certain schema.
async fn reverse_relations(
pool: &AnyPool,
document_id: &DocumentId,
schema_id: Option<SchemaId>,
) -> Result<Vec<String>, SqlStoreError> {
let schema_id_condition = match schema_id {
Some(schema_id) => format!("AND document_views.schema_id = '{}'", schema_id),
None => String::new(),
};
query_scalar(&format!(
"
SELECT
document_view_fields.document_view_id
FROM
document_view_fields
LEFT JOIN
operation_fields_v1
ON
document_view_fields.operation_id = operation_fields_v1.operation_id
AND
document_view_fields.name = operation_fields_v1.name
LEFT JOIN
document_views
ON
document_view_fields.document_view_id = document_views.document_view_id
WHERE
operation_fields_v1.field_type
IN
('pinned_relation', 'pinned_relation_list', 'relation', 'relation_list')
{schema_id_condition}
AND
operation_fields_v1.value IN (
SELECT document_views.document_view_id
FROM document_views
WHERE document_views.document_id = $1
) OR operation_fields_v1.value = $1
",
))
.bind(document_id.to_string())
.fetch_all(pool)
.await
.map_err(|e| SqlStoreError::Transaction(e.to_string()))
}

@sandreae sandreae changed the title Only issue new dependency tasks for "parent" documents which contain a pinned relation(list) to "child" document Optimization: only issue new dependency tasks for "parent" documents which contain a pinned relation(list) to "child" document Oct 11, 2023
@sandreae
Copy link
Member Author

It would be worth discussing whether this is actually an optimization, we would be swapping fewer tasks issued for a more complex (but maybe not that complex?) query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant