The project aims to create an interactive visualisation framework for genome mutations in gene and protein networks. The idea is to display information from co-developed database in the form of different interactive "views". Thanks to the possibility of quickly switching between the views, the user will be able to grasp an analysed gene or protein with it's context and interactions from different angles. It will also provide advanced filtering and interactive loading with AJAX requests.
The needleplot visualisation is now available in a separate repository: reimandlab/needleplot.
The application is Open Source and is licensed under the terms of GNU Lesser General Public License.
The project is developed with Python 3. It uses Flask as a web framework with database access provided by SQLAlchemy. Templating is performed with Jinja2 on the server side and Nunjucks.js on the client side (they have mostly compatible syntax). On the frontend the styles are written with SASS; the visualizations are jQuery and D3.js based. All used HTML, CSS and JS features are required to meet 95% level of support in web browsers as calculated by caniuse.com.
The website is developed inside Python3-based virtual environment. To quickly recreate the environment, use:
virtualenv -p python3 virtual_environment
source virtual_environment/bin/activate
cd website
python3 -m pip install -r requirements.txt
In case of problems with the execution of commands above on Debian/Ubuntu running machines, look at the bottom of this page where alternative instructions are given. An additional requirement is bsddb3
package, which is often easier to install system-wide with your package manager (on Debian named python3-bsddb3
). You may need to add a symbolic link to the package inside of your virtual environment.
To create a basic local copy of ActiveDriverDB you need a machine with at least 4 GB of RAM memory. If you wish to import genomic mappings for genome variants annotation you will more than 10 GB of RAM (recommended 16 GB).
For full deployment two MySQL databases will be needed: one for biological data and one for CMS.
You need to create them, along with relevant database users and privileges. This can be achieved with query like:
CREATE DATABASE db_bio;
CREATE DATABASE db_cms;
CREATE USER 'user'@'localhost' IDENTIFIED BY 'pass';
GRANT ALL PRIVILEGES ON db_bio.* TO 'user'@'localhost';
GRANT ALL PRIVILEGES ON db_cms.* TO 'user'@'localhost';
GRANT INSERT, DELETE, CREATE ROUTINE, ALTER ROUTINE, EXECUTE ON mysql.* TO 'user'@'localhost';
Remember to set secure password; user, database and host names are adjustable too. You may wish to create two separate users for each of databases, this case is supported too. Privileges on mysql database are required to allow to create functions.
Afterwards, you can start writing your configuration by copying the exemplar configuration file:
cp example_config.py config.py
Carefully replace variables mentioned in comments in the file as some of those have critical importance on application's security. To check if the database is configured properly, run the following command:
./manage.py
If you see (at the very end): Scripts loaded successfuly, no tasks specified.
it indicates that everything is working properly.
All data files can be downloaded easily with ./download.sh
script from website/data
directory.
Before server start, data have to be imported. Safest way to do this is to run:
./manage.py load all
albeit one might want to use Python's optimized mode (so import will be a lot faster, but it shouldn't be used with new, untested data since the assertions won't be checked in this mode):
python3 -OO manage.py load all
The given arguments instruct program to create and import data for: DNA -> protein mappings, biological relational database and Content Management System. During CMS creation you will be asked to set up login credentials for root user.
Warning: after each migration affecting protein's identifiers it is crucial to reimport mappings: otherwise the mappings will point to wrong proteins!
With manage.py
script you can load or remove specific parts of the database and perform very simple automigration (for newly created models). For further details use built-in help option:
./manage.py -h
Note that the helps will adapt to specified subcommands (i.e. it will show more details for: ./manage.py load -h
, and even more for: ./manage.py load mutations -h
)
MySQL specific: if you see a message MySQL server has gone away
, try to set global max_allowed_packet=1073741824;
If you don't want to perform steps specified below for every single deployment, you can use deploy.sh
script (after installing all dependencies listed in the steps below).
Stylesheet files are important part of this visualisation framework. To compile them, you will need to have sass
gem installed.
To create all *.css
files, run following command from website
directory:
sass --update .:.
Nunjucks templating system is used for client-side templating. It allows to move some repeatedly performed templating tasks to user's browser, which reduces transfer and speeds-up site loading. It uses jinja-nearly-compatible syntax.
To keep this process efficient, templates should be precompiled. To do so, you will need to get full nunjucks installation, for example with npm
(you should be able to install npm
with your system's package manager):
sudo npm install -g nunjucks
Afterwards compile templates with:
cd website/static/js_templates
./precompile.sh
And you are done. When DEBUG = False
, precompiled templates will be loaded automatically.
For cyclic tasks a CRON-like package Advanced Python Scheduler is used; it is fully integrated with application code and no additional setup is required.
The jobs functions are defined in jobs.py
file and scheduling information is stored in config.py
, in JOBS
variable.
To manage and execute user provided mutation search Celery Distributed Task Queue is used, with the broker and backend being RabbitMQ. Both RabitMQ and Celery need to be run as services and set up properly, as described in Celery. On Debian-based machines RabitMQ may be installed as a service directly from repositories.
To run celery worker as a script please use the following command:
celery -A celery_worker.celery worker
For deployment it should be started as a service.
A major part of configuration will be performed by setup.sh
automatically but one need to amend configuration file (celeryd
) so all paths are absolute and correct.
To start the service use init.d
script:
/etc/init.d/celeryd {start|stop|restart|status}
To start the webserver simply type:
./run.py
For adjusting the port or IP address, check -h
switch of the run.py
script
(note: to run on port 80 sudo privileges may be required).
Deployment on Apache2 server is more powerful alternative to Werkzeug webserver.
As you may want to have a virtual environment for this application, website/app.wsgi
provides ready-to go activation script to use with Apache2 (assuming that the name of your virtual environment is virtual_environment
). mod_wsgi
extension is required (apt-get install libapache2-mod-wsgi-py3
for Debian/Ubuntu).
Following extract from configuration file might be useful help for writing you own configuration:
DocumentRoot /some_path/website
# Prevent 'Timeout when reading response headers from daemon process'
WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess app user=some_username group=some_group threads=2
WSGIScriptAlias / /some_path/website/app.wsgi
<Directory /some_path/website>
WSGIProcessGroup app
WSGIApplicationGroup %{GLOBAL}
# Order deny,allow # do not use with Apache 2.4 or newer
# Allow from all # do not use with Apache 2.4 or newer
Require all granted # Apache 2.4 or newer
</Directory>
# Serve static files directly:
Alias /static/ /some_path/static/
<Directory /some_path/website/static/*>
# Order allow,deny # do not use with Apache 2.4 or newer
# Allow from all # do not use with Apache 2.4 or newer
Require all granted # Apache 2.4 or newer
</Directory>
<Location /static>
SetHandler None
</Location>
Usually you can find appropriate configuration files in directories like /etc/apache2/sites-enabled/
or so.
Apart from the soft (software, CMS-controlled) maintenance mode, an additional maintenance mode for more advanced works is available.
To set it up, add following code to the Apache configuration:
# Handle maintenance mode:
Alias /maintenance/ /some_path/website/static/maintenance.html
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/maintenance-mode-on -f
RewriteCond %{REQUEST_URI} !^/static.*
RewriteCond %{REQUEST_URI} !^/maintenance
RewriteRule ^(.*) /maintenance/ [R=503,L]
ErrorDocument 503 /maintenance/
RewriteCond %{DOCUMENT_ROOT}/maintenance-mode-off -f
RewriteCond %{REQUEST_URI} ^/maintenance
RewriteRule ^(.*) / [R,L]
and enable rewrite engine:
sudo a2enmod rewrite
Then, to enable the maintenance mode from within website directory use:
mv maintenance-mode-off maintenance-mode-on
and to disable:
mv maintenance-mode-on maintenance-mode-off
For Apache2, increasing the maximum length of URI is recommended (in order to handle GET requests, e.g. for filters which include large number of disease names). To do so, edit Apache configuration (typically /etc/apache2/apache2.conf
) appending:
LimitRequestLine 10000
LimitRequestFieldSize 10000
You can modify the default path to python executable used by WSGI by adding a python_path
argument to WSGIDaemonProcess
directive. It allows you to use small middleware script turning optimalization mode on. Here is an example script:
#!/bin/sh
exec python3 -OO "$@"
To login to root account (created with manage.py
script) visit /login/
page on your server. It will allow you to create, edit and remove standalone pages.
For proper compilation of some requirements, additional software will be needed on Debian-based servers. The required packages are:
build-essential python3 libmysqlclient-dev python3-dev git python3-bsddb3 pigz nodejs openjdk-7-jdk
Alternative commands to create virtual environment (workaround for Debian not having some essential python3 packages installed):
python3 -m venv --without-pip virtual_environment
source virtual_environment/bin/activate
curl https://bootstrap.pypa.io/get-pip.py | python3
deactivate
source virtual_environment/bin/activate
cd website
python3 -m pip install -r requirements.txt
All tests are placed in website/tests directory. Please find all steps explained in readme.md
file inside this subdirectory.
Browser compatibility testing is provided by BrowserStack which allows cloud testing on desktop browsers, real iOS and Android devices. It also allows automate testing integration.
The project is developed with support from Ontario Institute of Cancer Research and received support from Google Summer of Code 2016.
BrowserStack supports this open source project allowing us to use their testing systems for free.