Stefan's imputation accuracy package. And with helper functions for working with AlphaSuite's SNP files. Implemented in Fortran for speed.
This package was developed for working with SNP files in the format used in AlphaSuite. The format of these consists of first column with sample ID (Individual or Animal ID), and subsequent column counting minor alleles at each loci:
1001 0 0 1 2 1 2
1002 0.05 0.50 1.50 1.80 0.95 9
In this example, the file consists of two individuals (1001 and 1002); first individual is has genotypes (0
,0
,1
,2
,1
,2
) for six loci. The second individual has been imputed, and the genotypes are given as real values of best guess. The last genotype was not sufficiently imputed and therefore set as missing with value 9
.
Requirements for format: Space separated; number of digits is unimportant.
In R, load package with library(Siccuracy)
. There exists a number of functions:
- Calculate imputation accuracy (i.e. correlation between two genotype matrices):
imputation_accuracy
- Calculate hetereozygosity:
heterozygosity
- Helper functions:
get_ncol
,get_firstcolumn
,get_nlines
- Combining files:
rowconcatenate
andmergeChips
- Reading and writing files:
write.snps
,read.snps
- Convert phased files to genotype files:
phasotogeno
Most functions operate directly with files, not matrices loaded into memory!
Visiti https://github.com/stefanedwards/Siccuracy/releases to download the package for installation in R. Zip-files are available for Windows users.
Alternatively,
use devtools
to install directly from github:
#!r
library(devtools)
install_github(repo='stefanedwards/Siccuracy', ref = "master")
Lucky you.
Please report it using the issues tool.
Presumable.
The rowconcatenate
can be done by using paste
in bash -- except it requires that you first strip off
first column of all files except the first.