Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce complexity of database interactions #423

Open
3 tasks
sdsantos opened this issue Jan 27, 2025 · 3 comments
Open
3 tasks

Reduce complexity of database interactions #423

sdsantos opened this issue Jan 27, 2025 · 3 comments

Comments

@sdsantos
Copy link
Collaborator

Reference ooni/probe#2024

Simplifications:

  • Drop the urls and networks tables
  • Drop the results table
  • Make the creation of db measurement entries stateless
@sdsantos sdsantos moved this to Icebox in Roadmap Jan 27, 2025
@sdsantos sdsantos added this to Roadmap Jan 27, 2025
@sdsantos
Copy link
Collaborator Author

sdsantos commented Jan 27, 2025

Drop the urls and networks tables

The Url and Network tables can be dropped and their columns moved into the Measurement table.

Advantages:

  • A lot less joins should make queries faster

Disadvantages:

  • More space used, due to duplications:
    • URLs are usually tested multiple times across time.
    • Networks don't vary much across time, and they used to be linked to Results, not Measurements.
  • We currently show the same Network for a result, but this would mean there could be multiple Networks for the same result_id.

@sdsantos
Copy link
Collaborator Author

sdsantos commented Jan 27, 2025

Drop the results table

Going column by column of the Result table:

  • id (result_id on measurement)

  • descriptor_name / descriptor_runId / descriptor_revision (moved to measurement)

  • start_time (could be fetched from the first measurement)

  • is_viewed (could be moved to the measurement and aggregated: only true if true for all measurements of the same result)

  • data_usage_up / data_usage_down (needs to be moved to the measurement and summed)

  • failure_msg (not used at the moment, can be dropped)

  • network_id (we're planning to drop the network table, meaning measurements will need to store the network info)

  • task_origin (moved to the measurement)

  • is_done
    Used for:

    • knowing when we can re-run a websites result
    • showing a loading animation when result is ongoing
    • showing a result has an error because it's done but empty of measurements

    Just checking if measurements are done is not enough to deduce this state, because there are moments between one measurement being done and another one being created.


Summary, most columns are solvable, only is_done is more tricky, but it's not necessary for core features.

When we're showing the results list, we're already aggregating information from all the measurements that are part of a result. So going straight to the measurements table could have a slightly performance improvement. Space should increase due to the duplicated information. And some care will be needed to ensure information is coherent (for example, the same descriptor and task origin are used for the same result_id).

@sdsantos
Copy link
Collaborator Author

sdsantos commented Jan 27, 2025

Make the creation of db measurement entries stateless

Right now we show the ongoing Measurement inside the Results screen. We would need to drop that feature.
We also update the fields is_uploaded, is_upload_failed and upload_failure_msg when we retry to upload a measurement. But we could create a new entry instead, when that happens.

Nevertheless, SQLdelight handles well changes in values inside DB rows, and even provides a reactive stream of the row state changes. So this simplification may not gain us much in terms of complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants