SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients
Methodology for generating and using digital twins for clinical outcome prediction. An approach that combines synthetic data and network science to create digital twins for precision medicine.
We chose a population-based cancer registry from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA) for this study due to its large sample size and ease of access by simple registration with an email address to allow for reproducibility.
Follow the steps in SynTwin to repeat the work from the paper. step2_mpom_synthetic_dataset can be replaced with any synthetic data generation algorithms that work best for your data. We evaluated three synthetic data generation algorithms, categorical latent Gaussian process (CLGP), mixture of product of multinomials (MPoM), and medical generative adversarial network (MC-MedGAN) by utilizing the code from SYNDATA and multi-categorical-gans. Please take a look at synthetic_algorithms_comparison for details.
Moore JH, Li X, Chang J-H, Tatonetti NP, Theodorescu D, Chen Y, Asselbergs F, Venkatesan M, Wang Z. Pacific Symposium on Biocomputing, in press (2024).