simplify duckdb helpers that create tables from CSV files #91

e-kotov · 2024-10-03T22:02:45Z

Just thinking out loud here.

Currently we have an awesome package structure where a lot of modularity, flexibility, but also conciseness of some functions come from a set of .sql files that actually do most of the heavy-lifting. These same files also enable us to make the package multilingual, as we can have a separate set of .sql files for a particular language and magically get tables translated on the fly into any language without even touching the R code.

However, there are still quite a lot of .sql files. That is because currently we have a single .sql file per significant action, such as creating mapping a folder of csv files into a DuckDB table, creating ENUMs, creating a clean table. For each spatial granularity we have a separate set of such files. So we have a lot of these. Internally, because datasets are a bit different, we also have at least 3 R functions tailored to "origin-destination", "number of trips", and to "overnight stays" datasets. And each of these R functions handle some workflows that are slightly different, but also have commonalities.

So maybe, it is a good idea to refactor the package code in such a way that the logic of what is done with the raw CSV data is handled to even greater extent in .sql files, as these can contain any number of step by step operations. We will still need to take some values of R variables and inject them into the SQL statements we load form .sql files, but that may lead to a significantly more concise and therefore more maintainable code.

Perhaps, since currently everything seems to be working fine (at least in the https://github.com/rOpenSpain/spanishoddata/tree/v2-codebook branch), this is not a priority for the first stable release.

The text was updated successfully, but these errors were encountered:

Robinlovelace · 2024-10-04T21:52:14Z

Moving more of the code to .sql could make the package easier to port to other languages and easier to maintain. I like the idea, but would implementing it take more developer time than the savings through easier maintenance, I wonder.

e-kotov self-assigned this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify duckdb helpers that create tables from CSV files #91

simplify duckdb helpers that create tables from CSV files #91

e-kotov commented Oct 3, 2024

Robinlovelace commented Oct 4, 2024

simplify duckdb helpers that create tables from CSV files #91

simplify duckdb helpers that create tables from CSV files #91

Comments

e-kotov commented Oct 3, 2024

Robinlovelace commented Oct 4, 2024