This repository contains the documentation for reproducibility of the study "Indoor dust microbiota composition and allergic diseases: a scoping review to produce a reusable DAG".
This study has been presented at:
- Weekly department meeting on 28/05/2024. Slides
- Veterinary Science Day on 06/06/2024. Poster
You can contact me or post a request on this repository in case you encounter any issues, or if you'd like to discuss or ask a question on how to use this resource.
In order to replicate these analyses, I suggest that the user follows these steps:
- Install R and RStudio on your computer if you haven't done so. (Note that analyses described between part 1 and 4 were conducted under R version 4.4.0 and RStudio 2024.04.1, while analyses in part 5 and 6 were done in R version 4.2.2).
- Clone this repository. If you do not know how to do this, you can follow these instructions. Alternatively, you can download this repository, unpack it, and place it in a folder in your computer.
- You should now have all these files in your computer with an identical folder structure (described in the following section).
- In the main directory, open the file named Dust-Microbiome-Review.Rproj in RStudio.
- You can navigate through the folders on the right-bottom panel of R Studio. Open the R folder. You should now see a series of files ending with .qmd.
- Open one of these files. You can run every chunk of code sequentially to reproduce the analyses. Make sure to respect the order and if something fails, I recommend that you start running al chunks of code from the beginning. If you don't know how to run a chunk of code, you can imitate what this person is doing. If you get a message saying "Access denied", change from Visual to Source mode which can be done with the Ctrl+Shift+F4 command. NOTE: R scripts in the scripts folder are sourced into the main qmd files, reason why they are not meant to be used individually within this project.
I recommend that the .qmd files are opened and ran in sequential order, although some may only be interested in one of the parts of the analyses. If you are not able to follow the prior steps, you may also consider reviewing the PDF reports documenting the analyses. The suggested sequence for reviewing the flow of analysis is the following:
- Part 1. Preparation of data and saving into formats readable by other statistical software (.csv, .dta, .sav, .xpt) for greater reusability. PDF
- Part 2. Descriptive analyses. PDF
- Part 3. Analysis of studies citing the ESC-DAG method in the last two years. PDF
- Part 4. Mapping procedure, for the reconstruction of implied graphs from studies assessing the relationship between the indoor dust microbiome and allergic diseases. PDF
- Part 5. Pooled analysis of the indoor dust bacterial microbiome of 5 studies included in the review plus priorly unpublished data from households in the Netherlands. PDF
- Part 6. Pooled analysis of the indoor dust fungal microbiome of 3 studies included in the review plus priorly unpublished data from households in the Netherlands. PDF
Note: I have not provided data for reproducibility of parts 5 and 6 in this repository since this still needs to be discussed internally to decide the best way to share the corresponding data.
The project structure distinguishes three kinds of folders:
- read-only (RO): not edited by either code or researcher
- human-writeable (HW): edited by the researcher only.
- project-generated (PG): folders generated when running the code; these folders can be deleted or emptied and will be completely reconstituted as the project is run.
.
├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── Dust-Microbiome-Review.Rproj
├── data <- All project data files
│ ├── processed <- The final, canonical data sets. (PG)
│ ├── raw <- The original, immutable data. (RO)
│ └── temp <- Intermediate data that has been transformed. (PG)
├── docs <- Documentation for users (HW)
│ ├── manuscript <- Manuscript source, docx. (HW)
│ ├── presentations <- Powerpoint presentations, pptx. (HW)
│ ├── reports <- Project reports, pdf. (HW)
├── results
│ ├── output_figures <- Figures for the manuscript or reports (PG)
│ └── output_tables <- Output tables for the manuscript (PG)
└── R <- Source code for this project (HW)
│ ├── scripts <- Scripts sourced in main R markdown documents (PG)
│ └── sessions <- Text files with information of R sessions (PG)
└── renv <- Packaging dependencies (RO)
The full documentation with comments of statistical analyses can be found in the reports folder. These reports describe the operating system of R and package versions dependencies to reproduce each part of the analyses. I will include package dependencies in the renv folder in a lockfile later on.
This project is licensed under the terms of the MIT License.
This project structure template repository is adapted from the Good Enough Project Cookiecutter template by Barbara Vreede (2019).