Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Cloud Spanner's managed TTL feature #1577

Open
data-sync-user opened this issue Jun 28, 2024 · 0 comments
Open

Investigate Cloud Spanner's managed TTL feature #1577

data-sync-user opened this issue Jun 28, 2024 · 0 comments

Comments

@data-sync-user
Copy link
Collaborator

data-sync-user commented Jun 28, 2024

Syncstorage supports automatic deletion of expired records (per the TTL field on bso records) via purge_ttl.py. This script runs as a background task, nightly during periods of low activity of the database.

Such a background script was the only real option for TTLs when we first switched to Cloud Spanner in 2020 and we expressed the desire for a Spanner native TTL feature with the Google Cloud Team to replace it with. They then added support for native TTLs late 2021: https://cloud.google.com/blog/products/spanner/reduce-costs-and-simplify-compliance-with-managed-ttl-in-spanner

Spanner’s managed TTL support could potentially reduce costs as it provides TTL support “for free” without the need to run (nor maintain) the background script.

Distributed databases like Spanner often have a background “compaction” process that actually removes previously deleted data from its storage (oftentimes deletes are more like writes that “queue” deletion for that later process). Such databases, if offering native TTL support, tend to implement it as a part of that compaction process. This ends up being much more efficient than a manually ran background script like purge_ttl that scans for expired records. The reduced Spanner CPU incurred from our script could potentially help our costs in the future (especially when we switch to Spanner’s auto scaler).

We also incur a significant amount of extra storage for indices needed by purge_ttl (though it's not clear if they're even 100% necessary for the script): managed TTL support very likely completely negates the need for these extra indices, potentially resulting in cost savings.

Let’s take a close look at the managed TTL support and, assuming it can accommodate our needs, formulate a high level plan for how we’ll implement its use in syncstorage and how we’ll migrate over to it.

┆Issue is synchronized with this Jira Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant