Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database scale querying #25

Open
4 of 10 tasks
AndyRae opened this issue Jan 13, 2025 · 2 comments
Open
4 of 10 tasks

Database scale querying #25

AndyRae opened this issue Jan 13, 2025 · 2 comments
Assignees
Milestone

Comments

@AndyRae
Copy link
Member

AndyRae commented Jan 13, 2025

Is this the right issue type?

  • Yes, I'm planning work for this project team.

Summary

Currently Bunny has only been tested on very modest OMOP dataset sizes. We need to ensure queries are resolved on much larger datasets quickly.
For example - population sizes across the SDE regions of the UK go up to 10million people, but ideally Bunny will be able to handle much larger than this.

This deliverable is not about fixing or optimising yet, but setting up the infrastructure to test Bunny at this scale, beginning to understand any issues, and setting an initial benchmark.

Benchmarks we are interested in initially:

  • Time taken (T) to resolve a simple cold query (Q) across N patients

Acceptance Criteria

  • Bunny has been tested on OMOP datasets of 100,000 synthetic patients
  • Bunny has been tested on OMOP datasets of 1 million synthetic patients
  • Bunny has been tested on OMOP datasets of 10 million synthetic patients
  • Problems have been documented
  • Benchmarks have been documented

Tasks

  • Set up database
  • Run queries
  • Note benchmarks (this can just be commented on this issue for ease)

Confirm creation

  • This issue is ready
@prquinlan
Copy link
Contributor

Synthetic data created for 100k, 1M, 5M and 10M persons. Data available in our health informatics team. Created the 5M as well given 1M-10M is quite a jump.

@prquinlan prquinlan removed their assignment Jan 19, 2025
@prquinlan
Copy link
Contributor

Tested on 100k. All ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants