Skip to content

Latest commit

 

History

History
309 lines (213 loc) · 9.43 KB

README.md

File metadata and controls

309 lines (213 loc) · 9.43 KB

Arctic Rain on Snow Study (AROSS) Stations Database

Actions Status Documentation Status

PyPI version Conda-Forge PyPI platforms

GitHub Discussion

Reads Automated Surface Observation Station (ASOS) data from disk on the NSIDC archive to create a temporally and geospatially indexed database to quickly search events.

Note

TODO: Is this data available publicly and documented? How is it produced? Links!

Part of the AROSS Stations project.

Usage

To get started quickly, install Docker.

Important

Instructions that follow presume the current working directory is the root of this repository unless otherwise stated.

Dev quickstart

‼️ Don't worry about this unless you intend to change the code!

View the contributing docs for more details!

Set up the development compose configuration to be automatically loaded:

ln -s compose.dev.yml compose.override.dev.yml

Before starting the containers: dev environment setup

You will need local tooling like Nox and pre-commit to do development. Use whatever Python version management tool you prefer (Conda, VirtualEnv, PyEnv, ...) to create a virtual environment, then install this package and its dev dependencies:

pip install --editable ".[dev]"

[!IMPORTANT] Do this step before starting the stack in dev mode, or you may encounter an error (in which case, see the troubleshooting section for explanation!).

Debugging

You may wish to run the API process from an attached shell for interactive debugging. You can set up the relevant container to "sleep" in compose.dev.yml:

  api:
    <<: *dev-common
    entrypoint: "sleep"
    command: ["9999999"]
    # command: ["dev", "--host", "0.0.0.0", "./src/aross_stations_db/api"]

Then you can manually run the dev server interactively:

docker compose exec api fastapi dev --host 0.0.0.0 ./src/aross_stations_db/api

From here, you can interactively pause at any breakpoint() calls in the Python code.

UI development

The instructions below specify starting the stack with the --profile ui option. If you wish to develop in the user interface code repository, you should omit that flag and follow the instructions in the UI repo to develop.

Set envvars

Create a .env file or otherwise export the required envvars. You can use our sample environment file as a starting point, and modify as you see fit:

cp .env.sample .env

Important

$AROSS_DATA_BASEDIR should be Andy's data directory containing expected "metadata" and "events" subdirectories. TODO: Document how that data is created! How can the public access it?

Note

The connection string shown here is for connecting within the Docker network to a container with the hostname db.

Start the application stack

The stack is configured within compose.yml and includes containers:

  • aross-stations-db: A PostGIS database for quickly storing and accessing event records.
  • aross-stations-admin: An Adminer container for inspecting the database in the browser.
  • aross-stations-api: An HTTP API for accessing data in the database.
docker compose --profile ui up --pull=always --detach

Important

If you've pulled the images before, you may need to fetch new ones! Bring down the running containers:

docker compose down --remove-orphans

...then run the "up" command again.

Inspect the database

You can use the included Adminer container for quick inspection. Navigate in your browser to http://localhost:8080 and enter:

Field Value
System PostgreSQL
Server aross-stations-db
Username aross
Password Whatever you specified in the environment variable
Database aross

Note

At this point, the database is empty. We're just verifying we can connect. Continue to ingest next!

Run ingest

docker compose run cli init

From a fast disk, this should take under 2 minutes.

✨ Check out the data!

Now, you can use Adminer's SQL Query menu to select some data:

Example SQL query

This query returns 13 results at the time of this writing, but it may return more at a future time.

select event.*
from event
join station on event.station_id = station.id
where
  ST_Within(
    station.location,
    ST_SetSRID(
      ST_GeomFromText('POLYGON ((-159.32130625160698 69.56469019745796, -159.32130625160698 68.08208920517862, -150.17196253090276 68.08208920517862, -150.17196253090276 69.56469019745796, -159.32130625160698 69.56469019745796))'),
      4326
    )
  )
  AND event.time_start > '2023-01-01'::date
  AND event.time_end < '2023-06-01'::date
  AND event.snow_on_ground
  AND event.rain_hours >= 1
;

Or you can check out the API docs in your browser at http://localhost:8000/docs or submit an HTTP query:

Example HTTP query
http://localhost:8000/v1/stations?start=2023-01-01&end=2023-06-01&polygon=POLYGON%20((-159.32130625160698%2069.56469019745796,%20-159.32130625160698%2068.08208920517862,%20-150.17196253090276%2068.08208920517862,%20-150.17196253090276%2069.56469019745796,%20-159.32130625160698%2069.56469019745796))

View logs

In this example, we view and follow logs for the api service:

docker compose logs --follow api

You can replace api with any other service name, or omit it to view logs for all services.

View UI

Navigate to http://localhost:80.

Experiment in JupyterLab

This repository provides a demo notebook to experiment with the API. In your browser, navigate to http://localhost:8888. The password is the same as the database password you set earlier.

Cleanup

Shutdown

docker compose down

Cleanup

Database

There is no need to remove the _data/ directory to start over with a fresh database; the init CLI command will do that for you! However, if you want to completely remove the database to save space on your system, you may want to delete the _data/ directory.

Containers and images
# Bring down containers, even if a service name has changed
docker compose down --remove-orphans
# Clean up all unused images aggressively
docker system prune -af

Troubleshooting

Permission denied errors on API startup

When this error occurs, the webserver still responds to queries, but hot-reloading doesn't work.

You may need to grant read access to the _data/ directory if you're running locally. The problem is that FastAPI's hot-reloading functionality in dev needs to watch the current directory for changes, and I don't know of a way to ignore this directory that is usually not readable. The directory is likely owned by root, assuming it was created automatically by Docker, so you may need to use sudo.

sudo chmod -R ugo+r _data

API fails to start in dev with No module named 'aross_stations_db._version'

Unfortunately, this project doesn't work perfectly with Docker for development yet. This is because our project configuration (pyproject.toml) is set up to dynamically generate version numbers from source control at build-time:

[tool.hatch]
version.source = "vcs"
build.hooks.vcs.version-file = "src/aross_stations_db/_version.py"

If you freshly clone this project and immediately start up the docker containers in dev mode, the dynamically-generated version module, _version.py, won't exist yet in the source directory (because it is git-ignored). The source directory will be mounted in to the docker container, overwriting the pre-built source directory in the image that does (well, it did until it was overwritten 😉) include _version.py.

It's very important to complete the initial setup step of creating a local environment and installing the package and its development dependencies if you plan to be doing development. This will also give you Nox and pre-commit for automating development tasks.