You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have an awesome package structure where a lot of modularity, flexibility, but also conciseness of some functions come from a set of .sql files that actually do most of the heavy-lifting. These same files also enable us to make the package multilingual, as we can have a separate set of .sql files for a particular language and magically get tables translated on the fly into any language without even touching the R code.
However, there are still quite a lot of .sql files. That is because currently we have a single .sql file per significant action, such as creating mapping a folder of csv files into a DuckDB table, creating ENUMs, creating a clean table. For each spatial granularity we have a separate set of such files. So we have a lot of these. Internally, because datasets are a bit different, we also have at least 3 R functions tailored to "origin-destination", "number of trips", and to "overnight stays" datasets. And each of these R functions handle some workflows that are slightly different, but also have commonalities.
So maybe, it is a good idea to refactor the package code in such a way that the logic of what is done with the raw CSV data is handled to even greater extent in .sql files, as these can contain any number of step by step operations. We will still need to take some values of R variables and inject them into the SQL statements we load form .sql files, but that may lead to a significantly more concise and therefore more maintainable code.
Moving more of the code to .sql could make the package easier to port to other languages and easier to maintain. I like the idea, but would implementing it take more developer time than the savings through easier maintenance, I wonder.
Just thinking out loud here.
Currently we have an awesome package structure where a lot of modularity, flexibility, but also conciseness of some functions come from a set of
.sql
files that actually do most of the heavy-lifting. These same files also enable us to make the package multilingual, as we can have a separate set of.sql
files for a particular language and magically get tables translated on the fly into any language without even touching the R code.However, there are still quite a lot of
.sql
files. That is because currently we have a single.sql
file per significant action, such as creating mapping a folder of csv files into a DuckDB table, creating ENUMs, creating a clean table. For each spatial granularity we have a separate set of such files. So we have a lot of these. Internally, because datasets are a bit different, we also have at least 3 R functions tailored to "origin-destination", "number of trips", and to "overnight stays" datasets. And each of these R functions handle some workflows that are slightly different, but also have commonalities.So maybe, it is a good idea to refactor the package code in such a way that the logic of what is done with the raw CSV data is handled to even greater extent in
.sql
files, as these can contain any number of step by step operations. We will still need to take some values of R variables and inject them into theSQL
statements we load form.sql
files, but that may lead to a significantly more concise and therefore more maintainable code.Perhaps, since currently everything seems to be working fine (at least in the https://github.com/rOpenSpain/spanishoddata/tree/v2-codebook branch), this is not a priority for the first stable release.
The text was updated successfully, but these errors were encountered: