Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify duckdb helpers that create tables from CSV files #91

Open
e-kotov opened this issue Oct 3, 2024 · 1 comment
Open

simplify duckdb helpers that create tables from CSV files #91

e-kotov opened this issue Oct 3, 2024 · 1 comment
Assignees

Comments

@e-kotov
Copy link
Member

e-kotov commented Oct 3, 2024

Just thinking out loud here.

Currently we have an awesome package structure where a lot of modularity, flexibility, but also conciseness of some functions come from a set of .sql files that actually do most of the heavy-lifting. These same files also enable us to make the package multilingual, as we can have a separate set of .sql files for a particular language and magically get tables translated on the fly into any language without even touching the R code.

However, there are still quite a lot of .sql files. That is because currently we have a single .sql file per significant action, such as creating mapping a folder of csv files into a DuckDB table, creating ENUMs, creating a clean table. For each spatial granularity we have a separate set of such files. So we have a lot of these. Internally, because datasets are a bit different, we also have at least 3 R functions tailored to "origin-destination", "number of trips", and to "overnight stays" datasets. And each of these R functions handle some workflows that are slightly different, but also have commonalities.

So maybe, it is a good idea to refactor the package code in such a way that the logic of what is done with the raw CSV data is handled to even greater extent in .sql files, as these can contain any number of step by step operations. We will still need to take some values of R variables and inject them into the SQL statements we load form .sql files, but that may lead to a significantly more concise and therefore more maintainable code.

Perhaps, since currently everything seems to be working fine (at least in the https://github.com/rOpenSpain/spanishoddata/tree/v2-codebook branch), this is not a priority for the first stable release.

@e-kotov e-kotov self-assigned this Oct 3, 2024
@Robinlovelace
Copy link
Collaborator

Moving more of the code to .sql could make the package easier to port to other languages and easier to maintain. I like the idea, but would implementing it take more developer time than the savings through easier maintenance, I wonder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants