Skip to content

wurmlab/NpSearch

Repository files navigation

NpSearch (NeuroPeptideSearch)

Build Status Gem Version Dependency Status

Please note this currently in beta. We are currently working on something new that is amazingly fast (i.e. a few seconds to run) and a lot better in every sense (it even has an easy-to-use clicky, pointy interface). So watch this place.

Introduction

NpSearch is a tool that helps identify novel neuropeptides. As such it is not based on homology to existing neuropeptides - rather NpSearch is based on the common characteristics of neuropeptides and their precursors. In other words, it is a feature based tool.

The results produced includes the entire secretome ordered in the likelihood of the sequence encoding a neuropeptide. As such, it is expected that you only need to analyse the top half of the results.

Importantly, NpSearch produces a highly visual html file where the signal peptide and potential cleavage sites are highlighted. Additionally, NpSearch produces a fasta file of the results (i.e. the ordered secretome) that can easily be used in your own pipelines.

If you use this program, please cite us:

Moghul et al. (in prep) NpSearch: A Tool to Identify Novel Neuropeptides

NpSearch requires an input of a transcriptomic or predicted proteomic dataset, where each sequence is analysed and awarded a relative score of its likelihood of encoding a neuropeptide precursor. When provided with transcriptomic data, NpSearch translates each contig in all six frames and thereafter extracts all potential open reading frame (methionine to stop codon). Each predicted protein sequence is then analysed for the following neuropeptide-related characteristics:

Signal peptide: All neuropeptide precursors must have a signal peptide. This is due to the fact that the final bioactive neuropeptide has to be secreted from the cell of synthesis in order to be functionally active.

Cleavage sites: Being derived from a precursor, the bioactive neuropeptide has to be cleaved out from the precursor. Prohormone convertase enzymes cleave these bioactive peptides at specific cleavage sites. As certain cleavage motifs are more likely to be cleaved than other cleavage motifs, NpSearch awards sequences based on the type and number of cleavage sites present.

C-terminal Glycine: A significant number of bioactive neuropeptides have a C-terminal glycine that is amidated during post-translation modification. Thus such sequences are awarded with a higher score.

Repeated peptides: Numerous neuropeptide precursors are made up of multiple copies of the same neuropeptide. NpSearch attempts to clustering all potential cleaved neuropeptides, and then awarding sequences that produce larger clusters with a higher score.

Acidic spacer regions: Neuropeptide precursors that contain multiple neuropeptide copies tend to have highly acidic regions that separate these copies. If detected by NpSearch, the sequence is awarded with a higher score.

After analysing each sequence in the input dataset, NpSearch produces a visual html file and a fasta file, where sequences that are more likely to encode a neuropeptides precursor are placed at the top of the file. These results files can then be easily inspected and curated by researchers.

Installation

Installation Requirements

  • Ruby (>= 2.0.0)
  • SignalP 4.1.*z (Available from here)
  • CD-HIT (Available from here - Suggested Installation via Homebrew or Linuxbrew - brew install homebrew/science/cd-hit)
  • EMBOSS (Available from here - Suggested Installation via Homebrew or Linuxbrew - brew install homebrew/science/emboss)

Installation

While in beta, it is suggested that you run NpSearch from source (i.e. the non-recommended method below)

Simply run the following command in the terminal.

gem install npsearch

If that doesn't work, try sudo gem install npsearch instead.

Running From Source (Not Recommended)

It is also possible to run from source. However, this is not recommended.

# Clone the repository.
git clone https://github.com/wurmlab/npsearch.git

# Move into the NpSearch source directory.
cd NpSearch

# Install bundler
gem install bundler

# Use bundler to install dependencies
bundle install

# Optional: run tests, build documentation and build the gem from source
bundle exec rake

# Run NpSearch.
bundle exec npsearch -h
# note that `bundle exec` executes NpSearch in the context of the bundle

# Alternativaly, install NpSearch as a gem
bundle exec rake install
npsearch -h

Usage

Verify NpSearch installed by running the following command in the terminal:

npsearch

You should see the following output.

* Description: A tool to identify novel neuropeptides.

* Usage: npsearch [Options] [Input File]

* Options
    -s path_to_signalp,              The full path to the signalp script. This can be downloaded from
        --signalp_path                CBS. See https://www.github.com/wurmlab/NpSearch for more
                                      information
    -t, --temp_dir path_to_temp_dir  The full path to the temp dir. NpSearch will create the folder and
                                      then delete the folder once it has finished using them.
                                      Default: Hidden folder in the current working directory
    -n, --num_threads num_of_threads The number of threads to use when analysing the input file
    -d, --debug                      Run in debug mode
    -l, --min_orf_length N           The minimum length of a potential neuropeptide precursor.
                                      Default: 30
    -m, --max_orf_length N           The maximum length of a potential neuropeptide precursor.
                                      Default: 600
    -h, --help                       Display this screen
    -v, --version                    Shows version

Exemplar Usage Scenario

The following runs NpSearch on an input fasta dataset.

npsearch -s /path/to/signalp -n NUM_THREADS INPUT_FASTA_FILE

Debugging

Have an issue. No Problemo - Just try the following to produce a debugging log from NpSearch and send this to me at [email protected] or raise an issue above.

  1. First step would be to uninstall and reinstall npsearch
gem uninstall npsearch # Select all when it asks what versions
to uninstall
gem install npsearch
npsearch --version # you should see 2.1.4
  1. Ensure all dependencies are installed.
# Check if cd-hit is installed
cd-hit # you should see an output showing the cd-hit output.
# Check if `getorf` from the EMBOSS package is installed
getorf -version
# you should see: 'EMBOSS: 6.6.0.0
  1. Rerun your analysis with the debug flag (also specify the temporary directory to be on the safe side)
cd /path/to/analysis/folder
mkdir temp
npsearch -h # to double check whether npsearch works
npsearch -n 10 -s /path/to/signalp/script -d -t
/path/to/temp/dir /path/to/Trinity.fasta > debug.log
  1. Raise an issue (here)[] or send me an email at [email protected]. Be sure to attached the debug.log that you have just created and fully explain the issues that you are seeing.

Note

  • With the current version of NpSearch, there is an issue with the number of threads used - it seems to use more threads than that specified in the command line argument
  • NpSearch is expected to produce a high system load (as shown in top / htop) - this is because NpSearch runs SignalP as a separate process for each sequence (to speed things up). As such the system load (which is the number of processes called per unit time) can be higher than expected. This is normally not a reason for concern - however, we will probably try and find the middle ground between the speed and the number of processes called (or maybe someone could rewrite SignalP in C with multicore support)...