To run lncFunTK analysis on your own datasets, you need to prepare input dataset as described in Input files section, then run lncFunTK as described in run section. Finally, you can check lncFunTK analysis result in 07Report directory. For more details, please refer to Output files section.
Gene expression level of each stage can be quantified by RNA-seq data, then combine expression profiles from different stages into a matrix in plain text (i.e. gene.expr.mat).
Each column represents a gene expression profile in one stage.
TF ChIP-seq data is aligned to reference genome, then TF binding peaks is called by MACS2. TF binding profiles are in BED format are used in lncFunTK analysis.
TF binding profiles list must arange into plain text format (the first column is TF gene symbol, the second column is the absolute path of the corresponding TF binding profile in BED format (i.e. tf.chipseq.lst (https://github.com/zhoujj2013/lncfuntk/blob/master/demo/test_data/TfBindingProfiles/tf.chipseq.lst)).
Ago2 CLIP-seq data is aligned to reference genome by bowtie2, then Ago2 potential binding peaks is called by piranha or CIMS analysis.
Ago2 binding peaks in BED format is used as input file of lncFunTK (i.e. miRNA.binding.potential.bed). (Note: the fourth column should be a unique ID.)
Expressed microRNA molecules is collected by literature searching or from small RNA sequencing data analysis. Expressed microRNAs list is in plain text format (the first column is miRNA offical gene symbol and the second column is unique id in RefSeq database (i.e. MirRNA_expr_refseq.lst).
lncFunTK can predict, prioritize, and annotate newly assembled lncRNA functions if you provide newly assembled lncRNAs coordinates in GTF format as input (i.e. novel.final.gtf). If so, the expression level of newly assembled lncRNAs must be included in the gene expression profiles.
All aforementioned input files were aranged into a configure file and lncFunTK will read in input files base on configure file. Breifly, each parameter should seperate into Key and value by tab delimiter, lines start with "#" as comment text, which will not effect in lncFunTK analysis. The configure file must formatted as follows (i.e. config.txt):
# setting output dir
OUTDIR ./
# setting output result prefix
PREFIX mESCs
# setting species information (human or mouse)
SPE mouse
# genome version
VERSION mm9
# lncRNA coordinates
LNCRNA ./test_data/novel.final.gtf
# time serise transcriptome profiles(multiple datasets, place the major at first, at least 3 datasets)
EXPR ./test_data/GeneExpressionProfiles/gene.expr.mat
# the expression profile column corresponsing to the cell stage that you want
# to prediction long nocoding RNA
EXPRCUTOFF 0.5
PCCCUTOFF 0.95
# TF binding peaks from TF chipseq (multiple datasets, at least the key tfs)
CHIP ./test_data/TfBindingProfiles/tf.chipseq.lst
PROMTER 10000,5000
# Ago2 binding site from CLIP-seq (1 dataset)
CLIP ./test_data/MirnaBindingProfiles/miRNA.binding.potential.bed
EXTEND 100
# express miRNA list, must be office gene symbol and corresponding transcript
# ID (with NR_ prefix)
MIRLIST ./test_data/MirRNA_expr_refseq.lst
If the input files and configure file are well prepared, you can run lncFunTK as follows:
cd project_dir
# create makefile
perl lncfuntk_dir/run_lncfuntk.pl config.txt
# then make the file
make
# around 2-3 hours
# you can check the report (index.html) in 07Report.
firefox ./07Report/index.html
lncFunTK analysis result is placed in 07Report directory.
You can visualize LncFunTK analysis result by:
firefox index.html # or open in firefox browser
This plain text file contains co-expression network information by co-expression analysis of expression profiles.
01CoExprNetwork/prefix.CoExpr.int
Format:
gene1<tab>gene2<tab>interaction_type<tab>score<tab>evidence
...
This plain text file contains TF regulatory network information by analyzing multiple TF binding profiles.
02TfNetwork/TfNetwork.int
The format is the same as Co-expression network mentioned in 4.2.
This plain text file contains microRNA-gene interactions by analyzing Ago2 CLIP binding profile.
03MirnaNetwork/prefix.MirTargetGeneLevel.txt
The format is the same as Co-expression network.
Contain the integrative gene regulatory network, which created by lncFunTK.
07Report/GeneRegulatoryNetwork.interaction.txt
The format is the sample as Co-expression network.
Predicted functional lncRNAs and the corresponding FIS (Functional Information Score):
05FunctionalityPrediction/functional.lncrna.lst
05FunctionalityPrediction/nonfunctional.lncrna.lst
The format:
lncRNA_id1<tab>FIS1 (Corresponding Functional Information Score (FIS) for corresponding lncRNAs)
lncRNA_id2<tab>FIS2
...
lncRNA_idN<tab>FISN
GO annotation result for each predicted functional lncRNAs:
07Report/FunctionalLncRNA.txt
The format:
id1<tab>FIS1<tab>GoTermId<tab>GO DESC<tab>pvalue<tab>adjust-pvalue<tab>neighbor genes