Skip to content

Repository containing the code used in "All-cause and cause-specific mortality in respiratory symptom clusters: a population-based multicohort study"

License

Notifications You must be signed in to change notification settings

1151k/respiratory-symptom-clusters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R scripts used in

All-cause and cause-specific mortality in respiratory symptom clusters: a population-based multicohort study

Daniil Lisik, Helena Backman, Hannu Kankaanranta, Rani Basna, Linnea Hedman, Linda Ekerljung, Fredrik Nyberg, Anne Lindberg, Göran Wennergren, Eva Rönmark, Bright I. Nwaru, Lowie Vanfleteren
Under review

Description of R scripts

  • VARIABLES.r: define variables across cohorts and their characteristics (e.g., variables for cluster analysis)
  • CONSTANTS.r: define constants used in multiple R scripts (e.g., folder paths)
  • draw-dag.r: draw a directed acyclic graph (DAG)
  • clean-and-combine-datasets.r: do cleaning/preprocessing and merge data from the cohorts with standardized variable names
  • assess-pre-imputation-data.r: tabulate data pre-imputation by cohort
  • assess-missingness.r: quantify and visualize missingness in the data
  • select-random-dataset.r select random dataset for testing stability by random removal of subjects
  • impute.r: impute missing data (and assess validity of imputation), and set composite variables
  • pool-imputations.r: pool imputed datasets
  • add-comorbidity-data.r: add comorbidity register data and tabulate the results
  • assess-pre-post-imputation-data.r: tabulate data pre- and post-imputation (as well as tabulation of relevant register [comorbidity] data)
  • do-descriptive-statistics.r: generate descriptive statistics, primarily regarding respiratory symptoms frequency
  • assess-correlation.r: quantify and visualize correlation between potential cluster-defining variables
  • prepare-data-for-cluster-analysis.r: based on above findings, prepare data to be used in cluster analysis
  • lsh.r: perform cluster analysis
  • pool-cluster-labels.r: pool cluster labels
  • assess-clusters.r: tabulate data by clusters
  • add-mortality-data.r: prepare data to be used in mortality analysis
  • mortality-analysis.r: perform mortality analysis and extract results therefrom
  • pool-hazard-ratios-and-confidence-intervals.r: pool hazard ratios (HR) and corresponding 95% confidence intervals (95%CI)
  • catboost.r: assess feature importance by gradient boosting on decision trees
  • plot-cost.r: across all the cluster analyses across all datasets, pool and plot the clustering cost (total within-cluster distance)

Data availability

Due to regulations and contractual agreement with study participants, the underlying data are not available.


Contact

For any inquiries, please contact Daniil Lisik ([email protected]).


About

Repository containing the code used in "All-cause and cause-specific mortality in respiratory symptom clusters: a population-based multicohort study"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published