Skip to content

Commit

Permalink
Merge pull request #133 from technologiestiftung/staging
Browse files Browse the repository at this point in the history
feat: fetch daily weather data via BrightSky API (#132)
  • Loading branch information
Jaszkowic authored Aug 19, 2024
2 parents 871d99a + 889967a commit 7f27de4
Show file tree
Hide file tree
Showing 7 changed files with 196 additions and 4 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/test-harvest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ env:
MAPBOXLAYERNAME: "abc"
LOGGING: INFO
SKIP_MAPBOX: "True"
# DATABASE_URL: postgresql://fangorn:ent@localhost:5432/trees?schema=public
WEATHER_HARVEST_LAT: "52.520008"
WEATHER_HARVEST_LNG: "13.404954"

jobs:
test-harvest:
Expand Down Expand Up @@ -57,14 +58,14 @@ jobs:
path: api
- uses: supabase/setup-cli@v1
with:
version: 1.33.0
version: 1.136.3
- name: build the harvester
run: cd harvester && docker build --tag technologiestiftung/giessdenkiez-de-dwd-harvester:test .
- name: Start the api
id: api-start
run: cd api && supabase start | grep -w "service_role key" | cut -d ":" -f 2 | xargs | tr -d '\n' | awk '{print "service_role_key="$1}' >> "$GITHUB_OUTPUT" && cd ..
- name: run the harvester
run: docker run --env PG_SERVER='0.0.0.0' --env SKIP_MAPBOX --env PG_DB --env PG_PORT --env PG_USER --env PG_PASS --env SUPABASE_URL --env SUPABASE_SERVICE_ROLE_KEY='${{ steps.api-start.outputs.service_role_key }}' --env LIMIT_DAYS='30' --env SURROUNDING_SHAPE_FILE='/app/assets/buffer.shp' --env SUPABASE_BUCKET_NAME --env MAPBOXTOKEN --env MAPBOXUSERNAME --env MAPBOXTILESET --env MAPBOXLAYERNAME --env LOGGING --env OUTPUT --network host technologiestiftung/giessdenkiez-de-dwd-harvester:test
run: docker run --env PG_SERVER='0.0.0.0' --env WEATHER_HARVEST_LAT --env WEATHER_HARVEST_LNG --env SKIP_MAPBOX --env PG_DB --env PG_PORT --env PG_USER --env PG_PASS --env SUPABASE_URL --env SUPABASE_SERVICE_ROLE_KEY='${{ steps.api-start.outputs.service_role_key }}' --env LIMIT_DAYS='30' --env SURROUNDING_SHAPE_FILE='/app/assets/buffer.shp' --env SUPABASE_BUCKET_NAME --env MAPBOXTOKEN --env MAPBOXUSERNAME --env MAPBOXTILESET --env MAPBOXLAYERNAME --env LOGGING --env OUTPUT --network host technologiestiftung/giessdenkiez-de-dwd-harvester:test
- name: stop the api
run: cd api && supabase stop && cd ..
release:
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,22 @@ Make sure to set the environment variables properly before running the script. M
- Preprocess trees.csv using `tippecanoe` library.
- Start the creation of updated Mapbox layer

### 4. Harvesting daily weather data
For harvesting daily weather data, we use the free and open source [BrightSky API](https://brightsky.dev/docs/#/). No API key is needed. The script is defined in [run_daily_weather.py](harvester/src/run_daily_weather.py).
Make sure to set all relevant environment variables before running the script, e.g. for a run with local database attached:

```
PG_SERVER=localhost
PG_PORT=54322
PG_USER=postgres
PG_DB=postgres
PG_PASS=postgres
WEATHER_HARVEST_LAT=52.520008
WEATHER_HARVEST_LNG=13.404954
```

Make sure that especially `WEATHER_HARVEST_LAT` and `WEATHER_HARVEST_LNG` are set to your destination of interest.

## Docker

To have a local database for testing you need Docker and docker-compose installed. You will also have to create a public Supabase Storage bucket. You also need to update the `.env` file with the values from `sample.env` below the line `# for your docker environment`.
Expand Down
2 changes: 2 additions & 0 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,5 @@ runs:
SKIP_MAPBOX: ${{ inputs.SKIP_MAPBOX }}
LIMIT_DAYS: ${{ inputs.LIMIT_DAYS }}
SURROUNDING_SHAPE_FILE: ${{ inputs.SURROUNDING_SHAPE_FILE }}
WEATHER_HARVEST_LAT: ${{ inputs.WEATHER_HARVEST_LAT }}
WEATHER_HARVEST_LNG: ${{ inputs.WEATHER_HARVEST_LNG }}
2 changes: 1 addition & 1 deletion harvester/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ COPY . /app/

RUN cd /app/prepare && SHAPE_RESTORE_SHX=YES python create-buffer.py

CMD python /app/src/run_harvester.py
CMD /app/run_harvest.sh
16 changes: 16 additions & 0 deletions harvester/run_harvest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/sh
set -e

python /app/src/run_daily_weather.py || {
echo "run_daily_weather.py failed"
FAILED=1
}

python /app/src/run_harvester.py || {
echo "run_harvester.py failed"
FAILED=1
}

if [ "$FAILED" = "1" ]; then
exit 1
fi
151 changes: 151 additions & 0 deletions harvester/src/run_daily_weather.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
import sys
import psycopg2.extras
import psycopg2
from dotenv import load_dotenv
import logging
import os
import requests
import datetime
from weather_utils import extract

# This script fetches hourly weather data from the BrightSky API, aggregates it to daily weather data and stores it in the database

# Set up logging
logging.basicConfig()
logging.root.setLevel(logging.INFO)

# Load the environmental variables
load_dotenv()

# Check if all required environmental variables are accessible
for env_var in [
"PG_SERVER",
"PG_PORT",
"PG_USER",
"PG_PASS",
"PG_DB",
"WEATHER_HARVEST_LAT",
"WEATHER_HARVEST_LNG",
]:
if env_var not in os.environ:
logging.error("❌Environmental Variable {} does not exist".format(env_var))
sys.exit(1)

PG_SERVER = os.getenv("PG_SERVER")
PG_PORT = os.getenv("PG_PORT")
PG_USER = os.getenv("PG_USER")
PG_PASS = os.getenv("PG_PASS")
PG_DB = os.getenv("PG_DB")
WEATHER_HARVEST_LAT = os.getenv("WEATHER_HARVEST_LAT")
WEATHER_HARVEST_LNG = os.getenv("WEATHER_HARVEST_LNG")

# Establish database connection
try:
database_connection = psycopg2.connect(
dbname=PG_DB, user=PG_USER, password=PG_PASS, host=PG_SERVER, port=PG_PORT
)
logging.info("🗄 Database connection established")
except:
logging.error("❌Could not establish database connection")
database_connection = None
sys.exit(1)

today = datetime.date.today()

# Calculate the date one year ago
one_year_ago = today - datetime.timedelta(days=3 * 365)

# Generate a list of all dates between one year ago and today
date_list = [
one_year_ago + datetime.timedelta(days=x)
for x in range((today - one_year_ago).days + 1)
]

print(f"📅 Fetching weather data for {len(date_list)} days...")
weather_days_in_db = []
with database_connection.cursor() as cur:
cur.execute("SELECT measure_day, day_finished FROM daily_weather_data;")
weather_days_in_db = cur.fetchall()

for date in date_list:
today = datetime.date.today()

existing_weather_in_db_for_this_day = [
data_point_in_db
for data_point_in_db in weather_days_in_db
if data_point_in_db[0].date() == date
]
if existing_weather_in_db_for_this_day != []:
logging.info(f"🌦 Weather data for {date} already exists in the database...")
if existing_weather_in_db_for_this_day[0][1] == False:
logging.info(
f"🌦 Weather data for {date} was not finished in last run, updating now..."
)
with database_connection.cursor() as cur:
cur.execute(
"DELETE FROM daily_weather_data WHERE measure_day = %s", [today]
)
database_connection.commit()
else:
continue

# Using BrightSky API to fetch weather data https://brightsky.dev/docs/#/
# Hint: No API key is required
url = "https://api.brightsky.dev/weather"
params = {
"date": date,
"lat": WEATHER_HARVEST_LAT,
"lon": WEATHER_HARVEST_LNG,
}
headers = {"Accept": "application/json"}
response = requests.get(url, params=params, headers=headers)
weather_raw = response.json()
weather = weather_raw["weather"]

# Aggregate hourly weather data to daily weather data
sum_precipitation_mm_per_sqm = sum(extract(weather, "precipitation"))
avg_temperature_celsius = sum(extract(weather, "temperature")) / len(weather)
avg_pressure_msl = sum(extract(weather, "pressure_msl")) / len(weather)
sum_sunshine_minutes = sum(extract(weather, "sunshine"))
avg_wind_direction_deg = sum(extract(weather, "wind_direction")) / len(weather)
avg_wind_speed_kmh = sum(extract(weather, "wind_speed")) / len(weather)
avg_cloud_cover_percentage = sum(extract(weather, "cloud_cover")) / len(weather)
avg_dew_point_celcius = sum(extract(weather, "dew_point")) / len(weather)
avg_relative_humidity_percentage = sum(extract(weather, "relative_humidity")) / len(
weather
)
avg_visibility_m = sum(extract(weather, "visibility")) / len(weather)
avg_wind_gust_direction_deg = sum(extract(weather, "wind_gust_direction")) / len(
weather
)
avg_wind_gust_speed_kmh = sum(extract(weather, "wind_gust_speed")) / len(weather)

source_dwd_station_ids = extract(weather_raw["sources"], "dwd_station_id")

day_finished = date < today

logging.info(f"🌦 Weather data for {date} fetched via BrightySky API...")

with database_connection.cursor() as cur:
cur.execute(
"INSERT INTO daily_weather_data (measure_day, day_finished, sum_precipitation_mm_per_sqm, avg_temperature_celsius, avg_pressure_msl, sum_sunshine_minutes, avg_wind_direction_deg, avg_wind_speed_kmh, avg_cloud_cover_percentage, avg_dew_point_celcius, avg_relative_humidity_percentage, avg_visibility_m, avg_wind_gust_direction_deg, avg_wind_gust_speed_kmh, source_dwd_station_ids) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",
[
date,
day_finished,
sum_precipitation_mm_per_sqm,
avg_temperature_celsius,
avg_pressure_msl,
sum_sunshine_minutes,
avg_wind_direction_deg,
avg_wind_speed_kmh,
avg_cloud_cover_percentage,
avg_dew_point_celcius,
avg_relative_humidity_percentage,
avg_visibility_m,
avg_wind_gust_direction_deg,
avg_wind_gust_speed_kmh,
source_dwd_station_ids,
],
)

database_connection.commit()
6 changes: 6 additions & 0 deletions harvester/src/weather_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
def extract(weather_list, field):
return [
data_point[field]
for data_point in weather_list
if data_point[field] is not None
]

0 comments on commit 7f27de4

Please sign in to comment.