This README overviews running the MIMIC-OMOP ETL from the ground up on a PostgreSQL server. You will need an installation of PostgreSQL 9.6+ in order to run the ETL. You will also need MIMIC-III installed on this instance of postgres, see here for details: https://mimic.physionet.org/gettingstarted/dbsetup/
This README will assume the following:
- MIMIC-III v1.4 is available in the
mimic
database under themimiciii
schema - The standard concepts from Athena have been downloaded and are available somewhere (including running the extra script to download CPT code definitions)
- The R software library with remotes (
install.packages("remotes")
) and a GitHub packageremotes::install_github("r-dbi/RPostgres")
- The computer has an active internet connection (needed to clone certain repositories throughout the build)
To simplify the reusability of these scripts, we define a few environment variables and reuse them throughout the rest of the guide.
First is the connection string used to connect to the database. Note this also specifies the schema using search_path
.
export OMOP_SCHEMA='omop'
export OMOP='host=localhost dbname=postgres user=postgres options=--search_path='$OMOP_SCHEMA
export MIMIC='host=localhost dbname=postgres user=postgres options=--search_path=mimiciii'
# or, e.g., export OMOP="dbname=mimic options=--search_path=omop"
We will later use these environmental variables to connect to postgres.
NOTE: While ideally specifying the schemas would be completely configurable, much of the repository assumes the omop
schema. The mimiciii
schema should be configurable, but this is experimental.
See the omop/build-omop/postgresql/README.md file to build OMOP on postgres.
We need to create a concept_id
for each MIMIC-III local code. OMOP reserves concept_id
above 20,000,000+ for local codes, so we will use this range to insert ours.
psql "$MIMIC" -f mimic/build-mimic/postgres_create_mimic_id.sql
N.B. this script is called by etl_sequence.sql
After this, every table in the MIMIC-III schema will have an additional column called mimic_id
.
First prepare a configuration file for the R script and save it as mimic-omop.cfg
in the root folder of this repository. Here is an example of the file structure:
dbname=mimic
user=alistairewj
After that, run the R script from the root folder:
Rscript etl/ConceptTables/loadTables.R mimiciii
This will load various manual mappings to the database under the mimiciii
schema.
Be sure to run this from the root folder of the repository, or the relative path names will cause errors.
psql "$MIMIC" --set=OMOP_SCHEMA="$OMOP_SCHEMA" -f "etl/etl.sql"
In order to run the checks, you'll need pgTap. pgTap is a testing framework for postgres. You can install pgtap by either:
- using a package manager, e.g. on Ubuntu using:
sudo apt-get install pgtap
. - from source, following the install instructions here
If building from source, pay careful attention to the make output. You may need to install additional perl modules in order to use functions such as pg_prove, using cpan TAP::Parser::SourceHandler::pgTAP
.
You may also need to run the installation make
files as the postgres user, who has superuser privileges to the postgres database.
After you install it, be sure to enable the extension as follows:
psql "$MIMIC" -c "CREATE EXTENSION pgtap;"
Now the extension is available database-wide, and we can run the ETL.
psql "$MIMIC" -f "etl/check_etl.sql"