Skip to content

3. Reformatting and Harmonization

Sander W. van der Laan edited this page Dec 10, 2019 · 2 revisions

GWAS datasets are first cut in chunks of 125,000 variants* by metagwastoolkit.run.sh, and subsequently parse and harmonized by gwas.parser.R and gwas2ref.harmonizer.py.

During parsing the GWAS dataset will be re-formatted to fit the downstream pipeline. In addition some variables are calculated (if not present), for instance "minor allele frequency (MAF)", and "minor allele count (MAC)".

During harmonization the parsed dataset will be compared to a reference (see below) and certain information from the reference is obtained and added to the parsed data.

Finally, gwas.wrapper.sh will automagically wrap up all the parsed and harmonized data into two seperate datasets, entitled dataset.pdat for the parsed data, and dataset.rdat for the harmonized data.

* You can change this number in the metagwastoolkit.conf file.

Clone this wiki locally