Skip to content
Benjamin Linard edited this page Jul 18, 2018 · 34 revisions

Welcome to the RAPPAS wiki!

You will find here tutorials describing use cases of RAPPAS as well as comments on the different analysis options it provides. All tutorials are valid for a UNIX or MAC operating system.

RAPPAS can be installed and run on Windows but you will need to adapt the commands accordingly.

Table of contents

A. Detailed Installation

B. Tutorials

C. Comments on ancestral sequence reconstruction

D. A non-exhaustive list of other software related to phylogenetic placements

Detailed installation

Prerequisites

  • RAPPAS compilation requires a clean JDK 1.8 javac compiler installation. Java >=1.8 is a compulsory requirement as some operations are based on Lambda expressions.
  • Apache Ant is used to facilitate the compilation.

Debian distributions (Debian, Ubuntu, Mint ...)

1. If not already done, please install the Java JDK libraries

Using OpenJDK 1.8:

#install packages
sudo apt-get update
sudo apt-get install openjdk-8-jdk
#update relevant symlinks to make v1.8 default
sudo update-java-alternatives --set java-1.8.0-openjdk-amd64

Using the proprietary Oracle JDK 1.8:

#install packages
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
#update relevant symlinks to make v1.8 default
sudo apt-get install oracle-java8-set-default

2. Install Apache Ant

sudo apt-get install ant

If everything went fine in steps 1 and 2, the command java -version should return lines similar to:

java version "1.8.0_161"          <<--- the version must be > 1.8.x
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

3. Download and build RAPPAS

git clone -b master https://github.com/blinard-BIOINFO/RAPPAS.git
cd RAPPAS && ant -f build-cli.xml

4. Test RAPPAS executable

If everything went fine, you should be able to launch RAPPAS. A simple test can be to display its help page: java -jar dist/RAPPAS.jar -h Note: java -jar XXX.jar calls the java interpreter on a jar archive. Any RAPPAS command-line option must appear after these elements.

MAC OS

1. If not already done, please install the Java JDK libraries

Download the JDK from the Oracle JDK 8 webpage.

2. Unpack the jdk

By running the file jdk-8uxxx-macosx-x64.dmg. Your JDK is installed. Note that you may have to explicitely export the JDK with the command: export JAVA_HOME=/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home To avoid to reproduced this step, just add this line in your bash profile.

If everything went fine, the command java -v should return lines similar to:

java version "1.8.0_161"          <<--- the version must be > 1.8.x
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

3. Build the Ant package* Ant is bundle with the latest version of MAC OS. But if you need to install it:

#install brew if not already done
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew update
brew install ant

4. Download and build RAPPAS

git clone -b master https://github.com/blinard-BIOINFO/RAPPAS.git
cd RAPPAS && ant -f build-cli.xml

5. Test RAPPAS executable

If everything went fine, you should be able to launch RAPPAS. A simple test can be to display its help page: java -jar dist/RAPPAS.jar -h Note: java -jar XXX.jar calls the java interpreter on a jar archive. Any RAPPAS command-line option must appear after these elements.

Tutorials

Tutorial n°1: Accurate placement of Hepathitis C Virus

Tutorial n°2: Diversity estimation based on 16S rRNAs barcodes

This tutorial is an analogue to the tutorial provided on this page. It represents a typical case of bacterial community analysis.

  • The reference tree contains 652 bacterial taxa and is built from the alignment of their 16S rRNA genes (a classical barcode in bacterial metagenomics). It was retrieved from (link).
  • The sample is a collection of millions of real-world amplicons of 150bp (typically generated during a metabarcoding experiment). These were retrieved from the Earth Mi-crobiome Project (Thompson et al., 2017) via the European Nucleotide Archive (Silvester et al., 2018) using custom scripts from github.com/biocore/emp.

The main steps of the approach are:

  • Build a RAPPAS database from the reference tree.
  • Place 1 millions of amplicons on this reference.
  • Produce a diversity measure from the placement results.
  • Optionnally, visualize the placements in iTOL.

1. Download and build RAPPAS

git clone -b master https://github.com/blinard-BIOINFO/RAPPAS.git
cd RAPPAS && ant -f build-cli.xml
#if everything went fine, prints the program help:
java -jar RAPPAS.jar -h       

2. Build the RAPPAS Database

cd ../tutorials/tutorial_2
java -Xmx8G -jar ../../RAPPAS/dist/RAPPAS.jar -m b -w ./ -k 8 -r refalign.fasta -t reftree.nwk -s nucl -b 

Comments on commend-line parameters:

  • The '-Xmx' option: sets the maximum amount of memory that can be accessed by the process. For instance, -Xmx1024m allocated a maximum of 1Gb, -Xmx16G wouls allow a maximum of 16Gb ...
  • The 'w' option: sets the working directory, in this directory are created temporary and log files related to the database construction. At the end, the database file itself is also created in this directory.
  • Impact of the '-k' parameter: higher values of k involve higher memory requirement (adapt the Xmx option accordingly). The impact of the size of k is reference dataset dependant, but for classical taxonomic markers (16S rRNAS, cox1, rbcl...) we observed very limited differences between different values of k (see RAPPAS manuscript). k=8 is default and should be fine for many applications.
  • The '-b' option: sets the path to the binary of PhyML or PAML which is called during the step of ancestral sequence reconstruction. In our tests, PhyML was much faster than PAML but requires more memory. We recommend to use PhyML in most cases and try PAML only when very large reference trees and long reference alignments are considered.

3. Place a sample of 1,000,000 amplicons

java -Xmx8G -jar ../../RAPPAS/dist/RAPPAS.jar -m p -d XXX.union -q sample.fasta -w ./

Comments on command-line parameters:

  • The '-d' options: sets the path to the RAPPAS database file, created during the previous operation.

This produces a 'sample.fasta.jplace' file in the "/log" directory of the working directory (option -w). The 'jplace' format has a published file specification (manuscript). The result file can be consequently loaded in many external software allowing the exploitation of the placement results.

4. Basic diversity analysis

In this example, the exploitation of phylogenetic placement results is based on the GUPPY package. It allows to produce different diversity indexes (OTU alpha diversity, Unifrac-like measures...). To know more about the statistics behind these measure, please read the documentation of Masten and Gallagher.

This package is only one of the possible package dedicated to the analysis of phylogenetic placement results. Visit the bottom of this page to find a more extensive list of packages in different languages (R, C++, ...).

Note that the commands below are valid in Linux, search for equivalent commands if you are in MAC OS. The package is written in Python.

#If not already done, install the python package manager 'pip'
sudo apt-get install pip    #for python 2
sudo apt-get install pip3   #for python 3

#install guppy
pip install guppy           #for python 2
pip3 install guppy          #for python 3

#Calculate the Kantorovich-Rubinstein (KR) metric, which is a generalization of UniFrac,
#between different samples placed on the same reference tree.
guppy kr src/*.jplace

Comments on ancestral sequence reconstruction

A non-exhaustive list of software related to phylogenetic placements

Placement post-analysis

Placement visualisation

Alternatives to RAPPAS

Clone this wiki locally