Contents
- About
- Development
PlantIT is a framework for deploying research apps to high-performance/-throughput clusters. Specifically, it's a science gateway for image-based plant phenotyping. Future work may generalize the platform for any domain.
PlantIT debuted (with pre-release v0.1.0) at NAPPN 2022 (Feb. 22-25). See Releases for changelog and Roadmap for planned features and fixes. Capabilities will likely continue to evolve for some time, with "official" releases following an eventually forthcoming publication.
High-throughput phenotyping is resource-intensive and often demands virtual computing resources. This presents deployment challenges related to packaging and portability, and raises barriers to entry. Research software should:
- be highly configurable when necessary
- automate deployment details where possible
- let users (and developers) focus on the problem domain
PlantIT aims to bridge two user groups: researchers and developers. Of course one may wear both hats. The idea is an open-source conveyor belt for scientific software: make it easier to 1) package and share research applications, and 2) deploy them to clusters.
PlantIT is just glue between version control, data storage, container engine, and cluster scheduler. To publish an app, containerize it (e.g., write a Dockerfile) and add a plantit.yaml
file to the GitHub repository. Then run it from the browser with a few clicks.
Read on if you're interested in contributing to plantit
or hosting your own instance somewhere.
The following are required to develop or deploy plantit
in a Unix environment:
First, clone the repository:
git clone https://github.com/Computational-Plant-Science/plantit.git
To set up a new (or restore a clean) development environment, run scripts/bootstrap.sh
from the project root (you may need to use chmod +x
first). You can use the -n
option to disable the Docker build cache. This command will:
- Stop and remove project containers and networks
- If an
.env
file (to configure environment variables) does not exist, generate one with default values - Build the Vue front end
- Build Docker images
- Run migrations
Then bring everything up with docker-compose -f docker-compose.dev.yml up
(-d
for detached mode).
This will start a number of containers:
plantit
: Django web application (http://localhost:3000
)postgres
: PostgreSQL databasecelery
: Celery prefork workercelerye
: Celery eventlet workerflower
: Flower web UI for Celery (http://localhost:5555
)redis
: Redis instance (caching, Celery message broker)sandbox
: Ubuntu test environment
The Django admin interface is at http://localhost:3000/admin/
. To use it, you'll need to log into the site at least once (this will create a Django account for you), then shell into the plantit
container, run ./manage.py shell
, and update your profile with staff/superuser privileges. For instance:
from django.contrib.auth.models import User
user = User.objects.get(username=<your CyVerse username>)
user.is_staff = True
user.is_superuser = True
user.save()
You can also run ./scripts/configure-superuser.sh -u <your CyVerse username>
to accomplish the same thing.
Note that the bootstrap script will not clear migrations. To restore to a totally clean database state, you will need to remove all *.py
files from the plantit/plantit/migrations
directory (except for __init__.py
).
Once the containers are up, tests can be run with docker-compose -f docker-compose.dev.yml exec plantit ./manage.py test
.
To test remote job submissions using a local development version of the plantit
web application, you will need a service of some kind to accept and forward job completion signals to your machine's localhost
. One convenient tool is ngrok
. After downloading and adding the ngrok
executable to your path, run ngrok http 3000
to start a tunnel, then set the DJANGO_API_URL
variable in .env
to the URL ngrok
reports it's listening on.
In production configuration, NGINX serves static assets and reverse-proxies Django via Gunicorn (both in the same container).
To configure PlantIT for deployment, first clone the repo, then, from the root directory, run:
chmod +x /scripts/deploy.sh
./scripts/deploy.sh <configuration ('rc' or 'prod')> <host IP or FQDN> <admin email address>
This script is idempotent and may safely be triggered to run by e.g., a CI/CD server. This will:
- Bring containers down
- Fetch the latest version of the project
- Pull the latest versions of Docker containers
- Build the Vue front end
- Collect static files
- Configure NGINX (replace
localhost
inconfig/ngnix/conf.d/local.conf
with the host's IP or FQDN, configured via environment variable) - Update environment variables (disable debugging, enable SSL and secure cookies, etc)
- Bring containers up
- Run migrations
At this point the following containers should be running:
nginx
: NGINX server (reverse proxy)plantit
: Django web application behind Gunicorn (http://localhost:80
)postgres
: PostgreSQL databasecelery
: Celery background workerredis
: Redis instance
PlantIT uses Let's Encrypt and Certbot for SSL certification. The production configuration includes a certbot
container which can be used to request new or renew existing certificates from Let's Encrypt. Standard certificates last 90 days.
In production the certbot
container is configured by default to automatically renew certs when necessary:
certbot:
image: certbot/certbot
volumes:
- ./config/certbot/conf:/etc/letsencrypt/
- ./config/certbot/www:/var/www/certbot
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 24h & wait $${!}; done;'"
To manually request a new certificate, run:
docker-compose -f docker-compose.prod.yml run certbot
To renew an existing certificate, use the renew
command, then restart all containers:
docker-compose -f docker-compose.prod.yml run certbot renew
docker-compose -f docker-compose.prod.yml restart
Use the --dry-run
flag with any command to test without writing anything to disk.
Docker will read environment variables in the following format from a file named .env
in the project root directory (if the file exists):
key=value
key=value
...
bootstrap.sh
will generate an .env
file like the following if one does not exist:
VITE_TITLE=plantit
MAPBOX_TOKEN=<your Mapbox token>
MAPBOX_FEATURE_REFRESH_MINUTES=60
CYVERSE_REDIRECT_URL=http://localhost:3000/apis/v1/idp/cyverse_handle_temporary_code/
CYVERSE_CLIENT_ID=<your cyverse client id>
CYVERSE_CLIENT_SECRET=<your cyverse client secret>
CVVERSE_USERNAME=<your cyverse username>
CYVERSE_PASSWORD=<your cyverse password>
CYVERSE_TOKEN_REFRESH_MINUTES=60
NODE_ENV=development
DJANGO_SETTINGS_MODULE=plantit.settings
DJANGO_SECRET_KEY=<your django secret key>
DJANGO_DEBUG=True
DJANGO_API_URL=http://plantit:3000/apis/v1/
DJANGO_SECURE_SSL_REDIRECT=False
DJANGO_SESSION_COOKIE_SECURE=False
DJANGO_CSRF_COOKIE_SECURE=False
DJANGO_ALLOWED_HOSTS=*
DJANGO_ADMIN_USERNAME=<your django admin username>
DJANGO_ADMIN_PASSWORD=<your django admin password>
DJANGO_ADMIN_EMAIL=<your django admin email>
CELERY_EVENTLET_QUEUE=eventlet
USERS_CACHE=/code/users.json
USERS_REFRESH_MINUTES=60
USERS_STATS_REFRESH_MINUTES=10
STATS_WINDOW_WIDTH_DAYS=30
MORE_USERS=/code/more_users.json
AGENT_KEYS=/code/agent_keys
WORKFLOWS_CACHE=/code/workflows.json
WORKFLOWS_REFRESH_MINUTES=60
TASKS_LOGS=/code/logs
TASKS_TIMEOUT_MULTIPLIER=2
TASKS_STEP_TIME_LIMIT_SECONDS=20
LAUNCHER_SCRIPT_NAME=launch
INPUTS_FILE_NAME=inputs.txt
ICOMMANDS_IMAGE=computationalplantscience/icommands
SQL_ENGINE=django.db.backends.postgresql
SQL_HOST=postgres
SQL_PORT=5432
SQL_NAME=postgres
SQL_USER=postgres
SQL_PASSWORD=<your database password>
GITHUB_AUTH_URI=https://github.com/login/oauth/authorize
GITHUB_REDIRECT_URI=http://localhost:3000/apis/v1/users/github_handle_temporary_code/
GITHUB_SECRET=<your github secret>
GITHUB_CLIENT_ID=<your github client ID>
DOCKER_USERNAME=<your docker username>
DOCKER_PASSWORD=<your docker password>
NO_PREVIEW_THUMBNAIL=/code/plantit/front_end/src/assets/no_preview_thumbnail.png
AWS_ACCESS_KEY=<your AWS access key>
AWS_SECRET_KEY=<your AWS secret key>
AWS_REGION=<your AWS region>
AWS_FEEDBACK_ARN=<your AWS feedback ARN>
AGENTS_HEALTHCHECKS_MINUTES=5
AGENTS_HEALTHCHECKS_SAVED=12
TUTORIALS_FILE=/code/tutorials.pdf
FEEDBACK_FILE=/code/feedback.pdf
CELERY_AUTH=user:password
HTTP_TIMEOUT=15
CURL_IMAGE=curlimages/curl
GH_USERNAME=<your github username>
FIND_STRANDED_TASKS=True
Note that the following environment variables must be supplied manually:
MAPBOX_TOKEN
CYVERSE_CLIENT_ID
CYVERSE_CLIENT_SECRET
CVVERSE_USERNAME
CYVERSE_PASSWORD
GITHUB_CLIENT_ID
GITHUB_SECRET
AWS_ACCESS_KEY
AWS_SECRET_KEY
AWS_REGION
AWS_FEEDBACK_ARN
Several others will be auto-generated by scripts/bootstrap.sh
in a clean install directory:
DJANGO_ADMIN_PASSWORD
DJANGO_SECRET_KEY
SQL_PASSWORD
Some variables must be reconfigured for production environments (scripts/deploy
will automatically do so):
NODE_ENV
should be set toproduction
DJANGO_DEBUG
should be set toFalse
DJANGO_SECURE_SSL_REDIRECT
should be set toTrue
DJANGO_API_URL
should point to the host's IP or FQDN
An agent is an abstraction of a computing resource, such as a cluster or supercomputer. plantit
interacts with agents via key-authenticated SSH and requires the SLURM scheduler to be installed. (Support for additional schedulers is in development.)
Deployment targets may be configured programmatically or with the Django admin interface. To configure an agent via the Django admin site, make sure you're logged into plantit
, then navigate to http://localhost:3000/admin/
(https://<host>/admin/
in production). Select the Agents
tab on the left side of the screen, then Add Agent
.
On many clusters it is customary to configure dependencies on a per-user basis with a module system, e.g. module load <some software>
. The pre_commands
agent property is the place for commands like these: when provided, they will be prepended to all commands plantit
sends to the cluster for job orchestration.
plantit
deployment targets must run some Linux distribution with either the sh
or bash
shells available. Only 2 dependencies are required:
- SLURM
- Singularity
plantit
tasks expect standard SLURM commands (e.g., sbatch
, scancel
) to be available. Singularity must also be installed and available on the $PATH
.
Docker Hub applies rate limits to unauthenticated users. These are easy to meet or exceed, since Singularity queries the Docker API on each singularity exec docker://<some container>
. It is recommended to use singularity remote login --username <your Docker username> docker://docker.io
with a paid Docker account: this will cache your Docker credentials on the deployment target for Singularity to use thereafter.
To build the sphinx
documentation locally, use:
docker run -v $(pwd):/opt/dev -w /opt/dev computationalplantscience/plantit sphinx-build -b html docs docs_output
The DIRT migration feature allows users of the original DIRT web application to migrate their data to plantit
. To test this feature, you will need to have access to the DIRT server and database. The following environment variables must be set:
DIRT_MIGRATION_DATA_DIR
: the directory on the DIRT server where DIRT data is storedDIRT_MIGRATION_HOST
: the hostname of the DIRT serverDIRT_MIGRATION_PORT
: the SSH port of the DIRT serverDIRT_MIGRATION_USERNAME
: the SSH username for the DIRT serverDIRT_MIGRATION_DB_HOST
: the hostname of the DIRT database serverDIRT_MIGRATION_DB_PORT
: the port of the DIRT database serverDIRT_MIGRATION_DB_USER
: the username of the DIRT database userDIRT_MIGRATION_DB_DATABASE
: the name of the DIRT databaseDIRT_MIGRATION_DB_PASSWORD
: the DIRT database password
An SSH tunnel must also be opened to the DIRT server, as the database is not open to external connections. For instance, to open a tunnel from port 5678 on the DIRT server to port 3306 on a development machine:
ssh -L 3306:localhost:3306 -p <DIRT server SSH port> <your cyverse username>@<DIRT server IP or FQDN>
On some Linux systems it may be necessary to substitute the loopback IP address 127.0.0.1
for localhost
Be sure to set DIRT_MIGRATION_DB_HOST=host.docker.internal
to point the Docker containers to the host's loopback/localhost address.
Some extra configuration is necessary for Linux systems to allow containers to access services running on the local host. The docker-compose.dev.yml
configuration file configures the plantit
, celery
, and celerye
containers with the extra_hosts
option:
extra_hosts:
- "host.docker.internal:host-gateway"
This is only necessary on Linux systems. On Mac and Windows, the host.docker.internal
hostname is automatically configured. See this post for more information.