corrPNG : A tool for calculating the correlation coefficient between Phenotype data and the Number of Genes.
#Calculate pearson correlation coefficient
python corrPNG.py pearson -i genePAV.Rtab -p pheno -o output -t 0.7
#Calculate spearman rank correlation coefficient
python corrPNG.py spearman -i genePAV.Rtab -p pheno -o output -t 0.7
#Calculate kendall rank correlation coefficient
python corrPNG.py kendall -i genePAV.Rtab -p pheno -o output -t 0.7
#Visualization correlation diagram
python corrPNG.py plot -i genePAV.Rtab -p pheno.tsv -o output -g gene_id -n phenotype_id -r
#Change the column notation of tsv file
python corrPNG.py sort_column -i genePAV.Rtab -o output_sort -s samplelist
#Change row and column
python corrPNG.py transpose_table -i input.tsv -o output.tsv
The following software must be installed on your machine:
Python : tested with version 3.8
install module matplotlib, pandas, scipy and seaborn.
corrPNG.py calculate correlation coefficient between the phenotype data and copy number variation of gene including the gene presence/absence. Run the following command in pangene and create the tab-delimited file.
k8 pangene.js gfa2matrix graph.gfa > genePAV.Rtab
k8 pangene.js gfa2matrix -c graph.gfa > geneCNV.Rtab
Use obtained tsv file for corrPNG.py as input. Phenotype data is averrable csv and tsv file. Set the data as that row is phenotype name and column is sample name. corrPNG.py has three commands for calculate correlation coefficient:
pearson for pearson’s coefficient (r),
spiaman for spiarman rho (ρ), and
kendoll for kendoll tau (τ).
And output the file coefficient and p value over the threshold. Threshold should be one value (set 0.7 in default) and it works as positive and negative (e.g. 0.7 and -0.7).
plot generate correlation diagram (only scatter plot) between one gene and one phenotype. Set gene name as -g and phenotype name as -n. If you add -r option, write regression line. If you add -jx and -jy option, generate jitter plot.
sort_column is command allows you to arbitrarily change the column notation of genePAV.tsv obtained by gff2matrix in pangene.
transpose_table is change row and column of csv or tsv file.
Please include the extension for the options of these two command.
xxx