Data V2 #59

emackev · 2023-10-31T21:40:51Z

This PR is for developing the next version of our data pipeline. See docs/data_sources.ipynb for variables that are currently in the dataset. Desired features for new pipeline:

Data cleaning process documented in code. Each variable has

high-level description in data_sources.ipynb
Initial cleaning pipeline: convert info from websites into wide format with GeoFIPS, GeoName, etc columns, and perform any dataset-specific cleaning. gdp_wide.csv is the canonical example. (pre-sql) The results of the initial cleaning should be stored in data/raw.
Final steps to make each variable compatible with the rest of our data (handle exclusions, standardize data, etc.)
Step 2 can be the same code for all datasets (i.e., the clean_variable code). The results of the final steps should be stored in data/processed (as 4 csvs, wide vs long, and raw vs std)

Easy to add new variables (features)
Easy to perform relevant queries
More variables included, based on what will be most relevant to users https://www.notion.so/basisresearch/Backend-Data-22d314e7524e4a2fb1281401c91f4545

Steps to do:

emackev · 2023-11-13T15:11:28Z

See desired variables: https://www.notion.so/basisresearch/Backend-Data-22d314e7524e4a2fb1281401c91f4545

rfl-urbaniak · 2024-09-09T12:52:10Z

Superseded by the current data pipeline.

emackev added the status : WIP label Oct 31, 2023

Emily added 3 commits November 1, 2023 09:18

plot_variable_from_fips.ipynb

db8cee1

check variables Ria asked for

bf0fa9c

oops didn't save notebook

9d71708

Emily added 7 commits November 11, 2023 16:43

adding sql db

3ae6fb1

Merge branch 'main' into data_v2

afcac7d

remove extra us_counties.db

961048a

testing standardize and scale

b0ad50f

add todo list

1090649

add tests in notebook

0d56811

md description of what's in notebook

5b30e59

riadas self-assigned this Nov 13, 2023

riadas approved these changes Nov 13, 2023

View reviewed changes

Make a counties table

ff5cbe6

rfl-urbaniak closed this Sep 9, 2024

rfl-urbaniak deleted the data_v2 branch November 15, 2024 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data V2 #59

Data V2 #59

emackev commented Oct 31, 2023 •

edited

Loading

emackev commented Nov 13, 2023

rfl-urbaniak commented Sep 9, 2024

Data V2 #59

Data V2 #59

Conversation

emackev commented Oct 31, 2023 • edited Loading

emackev commented Nov 13, 2023

rfl-urbaniak commented Sep 9, 2024

emackev commented Oct 31, 2023 •

edited

Loading