Ass part of the 2018 Camp Fire developmental response project, we need to scrape cal air sensor data to estimate exposure to our groups of interest. We're using CARB data from (this) site. We're scraping data on temperature, ozone, PM2.5, and NO2 to compare between a 2019 control group, a 2018 in-utero exposed group, and a 2017 infant exposed group.
This data will be used to compare lifelong exposure to the above pollutants and also the 2018 Camp Fire exposure versus a 2019 control window.
This script executes on an outdated version of Firefox (Version 56) which should be downloaded prior to running the Python script. To assist in managing Python packages, install either mini or anaconda through either the terminal or app. Miniconda (a lighter version of Anaconda) can be found (here).
If you don't already have Git installed, git can be found here.
Make sure you have all of the above installed on your local computer (Version 56 of Firefox, Git, Python, and Miniconda) before building the environment and executing the script.
-> For a more in-depth guide on installing the software and setting up the virtual environment, see here. The link leads to a different script built by a previous member of the lab who was scraping data from the Webvitals servers. <-
First, pull this repository by executing the following your terminal:
git clone https://github.com/aeguess/carb_scraper
To build the environment using Conda, execute the following in your terminal:
conda env create -n carb_env -f environment.yml
followed by:
conda activate carb_env
This will build and activate the environment needed to use the correct versions of the Python packages. To then execute the script, execute the following in your terminal:
cd carb_scraper
followed by:
python carb_pollutant_scraper.py
This will change the terminal's working directory to your Downloads folder, and then execute carb_scraper. The Python script will then begin executing, and will take a few hours to complete; Firefox will open and begin automatically navigating the pages. The output will be added to your Downloads folder.