Skip to content

A template for generating Dockerized implementations of Spyglass exports

License

Notifications You must be signed in to change notification settings

LorenFrankLab/spyglass-export-docker

 
 

Repository files navigation

Dockerizing a Spyglass Export

About

Spyglass is an open-source framework for managing and analyzing data in neuroscience research. The Export feature allows users to generate scripts1 to recreate both their conda environmen and the database, as well as upload data to DANDI Archive.

This repository is intended to be used with the Docker to create and share a reproducible environment for replicating a paper's analyses.

Quick Start

  1. Pre-requisites: make and docker.
    • make is available on most Unix systems as part of GNU Make, or available with choco (Windows) or brew (macOS).
    • Docker builds, runs, and manages containers.
  2. Register for Docker Hub and run docker login.
  3. Clone this repository to your local machine.
  4. Copy env.example to .env and edit the values.
  5. Copy the paper's notebooks to notebooks/2.
  6. Remove items from the environment.yml that require GPU support like jax.
  7. Run make build to build the docker image.
  8. Navigate to http://localhost:8888/lab, using the paper ID as the password.
  9. Test the notebooks.
  10. Run make publish to publish the image.
  11. Share the image with collaborators, who can run make run to start the container and visit the same URL. They will need ...
    1. The .env file you used.
    2. The docker-compose-collab.yml file for building from the published images.
    3. The Makefile for the command to run the container from the published images.

Overview

  • Makefile: Contains commands for building and publishing the docker image.
    • copy_files: Copies the export sql and yml files to the export_data/ directory.
    • down: Stops and removes existing docker containers.
    • up: Runs down, then starts the docker container.
    • build: Alias for up.
    • enter: Enters the running docker container for debugging.
  • docker-compose.yml: Defines the docker containers and volumes.
    • db: Service. MySQL database container.
    • hub: Service. Jupyter notebook server container.
    • conda: Volume. Cache of the hub's conda environment.
    • db_data: Volume. Cache of the database's data.
  • docker-compose-collab.yml: Similar to docker-compose.yml, but using the hub image from Docker Hub. This file is intended for collaborators.
  • Dockerfile: Adds additional instructions to the hub container.
    • Copies in datajoint and jupyter configuration files.
    • Installs git for possible git installs in the conda environment. For a faster build time, remove this line if no such installs are needed.
    • Installs the paper's conda environment.
    • Runs entrypoint.py to configure the datajoint connection.
  • env.example: Example environment variables for the .env file. Must be copied to .env and edited.
  • config: Contains additional configuration files.
    • .datajoint_config.py: Default configuration for the datajoint connection.
    • entrypoint.py: Edits the datajoint config based on environment variables.
    • entrypoint_db.sh: Loads exported sql files. Run my the db service.
    • jupyter_server_config.py: Configures the jupyter notebook server. - Sets the default kernel to the paper's conda environment. - Sets the server password.

Speed

The first time you run make build, the docker image will be built from scratch. This can take a while, depending on the size of the conda environment. Subsequent builds will be faster, as docker will cache the layers.

To speed up the process, projects that do not use the position pipeline can remove the line in Docker_hub.Dockerfile that installs ffmpeg and other dependencies.

If your build is still slow, try removing unnecessary packages from your conda environment.yml file. Note that running make build will copy the file from it's original location.

Security

This repository is intended for use in a secure environment. It is not intended for use in a production environment.

By default the jupyter notebook server password is the paper ID variable.

Troubleshooting

If you encounter any issues, please check the status of the docker containers with docker ps -a. This will show the status of containers db and hub. If either is 'restarting', you can check the logs with docker logs <name>.

Conda Fails

If conda environment creation fails, you may need to remove items from the environment.yml that require GPU support like jax.

Table Declaration, Collation

By default, the Makefile will copy the sql files to the export_data/ and run the following commands on each file:

sed -i 's/ DEFAULT CHARSET=[^ ]\w*//g' _Populate_YourPaper.sql
sed -i 's/ DEFAULT COLLATE [^ ]\w*//g' _Populate_YourPaper.sql

This gets ahead of (a) OperationalError when trying to import a table or (b) SQL ERROR 3780 (HY000) in the docker logs.

What does this do?

These sed commands remove encoding specifications from the sql file(s).

CREATE TABLE your_table (
    ...
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE swedish_latin=ci COMMENT='X';

Will become:

CREATE TABLE your_table (
    ...
) ENGINE=InnoDB COMMENT='X';

The line with ENGINE=InnoDB should always end in ;. It may or may not have a COMMENT field.

Table Declaration, Key Length

Spyglass instances declared before version 0.4.3 permit longer keys than MySQL defaults would permit. This may cause the import of downstream tables to error, reporing Excessive key length. By default, the Makefile will run the following command on each file, mirroring the adjustments from PR #664.

sed -i -e \
's/ `nwb_file_name` varchar(255)/ `nwb_file_name` varchar(64)/g' \
's/ `analysis_file_name` varchar(255)/ `analysis_file_name` varchar(64)/g' \
's/ `interval_list_name` varchar(200)/ `interval_list_name` varchar(170)/g' \
's/ `position_info_param_name` varchar(80)/ `position_info_param_name` varchar(32)/g' \
's/ `mark_param_name` varchar(80)/ `mark_param_name` varchar(32)/g' \
's/ `artifact_removed_interval_list_name` varchar(200)/ `artifact_removed_interval_list_name` varchar(128)/g' \
's/ `metric_params_name` varchar(200)/ `metric_params_name` varchar(64)/g' \
's/ `auto_curation_params_name` varchar(200)/ `auto_curation_params_name` varchar(36)/g' \
's/ `sort_interval_name` varchar(200)/ `sort_interval_name` varchar(64)/g' \
's/ `preproc_params_name` varchar(200)/ `preproc_params_name` varchar(32)/g' \
's/ `sorter` varchar(200)/ `sorter` varchar(32)/g' \
's/ `sorter_params_name` varchar(200)/ `sorter_params_name` varchar(64)/g' _Populate_YourPaper.sql

This may result in being unable to import keys longer than the specified length. If you encounter this issue, you may need to adjust the sed commands in the Makefile to match the keys in your sql files.

Elevated Access

The default hub container does not have sudo access. If you need to install additional package or debug within the container, you may wish do the following:

Admin within the container

Add sudo for the default user, mysql credentials to the Dockerfile, and add mysql-client to allow command line access to the database.

USER root

# Allow sudo
RUN echo "jovyan:jovyanpassword" | chpasswd
RUN echo "jovyan ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/jovyan
# Add mysql credentials - Vars must also be added to docker-compose.yml
ARG MYSQL_HOST
ARG MYSQL_USER
ARG MYSQL_ROOT_PASSWORD
# Add default mysql credentials
RUN echo -e "\
[client]\n\
host=${MYSQL_HOST}\n\
user=${MYSQL_USER}\n\
password=${MYSQL_ROOT_PASSWORD}\n\n\
[mysqld]\n\
character-set-server = latin1\n\
collation-server = latin1_swedish_ci" > ${HOME}/.my.cnf
RUN apt update && apt install mysql-client -y

USER ${NB_UID}

Each ARG item must also be added to the docker-compose.yml file under the hub service:

    build:
      context: .
      dockerfile: Dockerfile
      args:
        MYSQL_HOST: db
        MYSQL_USER: root
        MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}

And add GRANT_SUDO=yes to the .env file.

Footnotes

  1. The .sh scripts generated by Spyglass must first be run by a database administrator to create the database and tables. The resulting .sql will then be used to populate the Docker database.

  2. If your paper depends on a specific version of Spyglass or additional custom packages, please link to these in your notebooks, and ensure they are included in the environment.yml file in the export directory. You can find the version of Spyglass at the top of any .sql file, and find the link in the list of Spyglass tags.

About

A template for generating Dockerized implementations of Spyglass exports

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Makefile 40.8%
  • Jupyter Notebook 25.3%
  • Dockerfile 16.6%
  • Shell 13.0%
  • Python 4.3%