Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES)---descriptive statistics and data quality checks on an OMOP CDM v5 databases

Vignette: Running Achilles on Your CDM

Achilles consists of several parts:

Precomputations (for database characterization)
Achilles Heel for data quality
Export feature for AchillesWeb (or, Atlas Data Sources can read the Achilles tables directly)
Index generation for better performance with Atlas Data Sources

Achilles is actively being developed for CDM v5.x only.

Getting Started

(Please review the Achilles Wiki for specific details for Linux)

Make sure you have your data in the OMOP CDM v5.x format (https://github.com/OHDSI/CommonDataModel).
This package makes use of rJava. Make sure that you have Java installed. If you don't have Java already installed on your computer (on most computers it already is installed), go to java.com to get the latest version. If you are having trouble with rJava, this Stack Overflow post may assist you when you begin troubleshooting.

In R, use the following commands to install Achilles.

if (!require("devtools")) install.packages("devtools")

# To install the master branch
devtools::install_github("OHDSI/Achilles")

# To install latest release (if master branch contains a bug for you)
# devtools::install_github("OHDSI/Achilles@*release")  

# To avoid Java 32 vs 64 issues 
# devtools::install_github("OHDSI/Achilles", args="--no-multiarch")

To run the Achilles analysis, first determine if you'd like to run the function in multi-threaded mode or in single-threaded mode. Use runCostAnalysis = FALSE to save on execution time, as cost analyses tend to run long.

In multi-threaded mode

The analyses are run in multiple SQL sessions, which can be set using the numThreads setting and setting scratchDatabaseSchema to something other than #. For example, 10 threads means 10 independent SQL sessions. Intermediate results are written to scratch tables before finally being combined into the final results tables. Scratch tables are permanent tables; you can either choose to have Achilles drop these tables (dropScratchTables = TRUE) or you can drop them at a later time (dropScratchTables = FALSE). Dropping the scratch tables can add time to the full execution. If desired, you can set your own custom prefix for all Achilles analysis scratch tables (tempAchillesPrefix) and/or for all Achilles Heel scratch tables (tempHeelPrefix).

In single-threaded mode

The analyses are run in one SQL session and all intermediate results are written to temp tables before finally being combined into the final results tables. Temp tables are dropped once the package is finished running. Single-threaded mode can be invoked by either setting numThreads = 1 or scratchDatabaseSchema = "#".

Use the following commands in R:
```
library(Achilles)
connectionDetails <- createConnectionDetails(
  dbms="redshift", 
  server="server.com", 
  user="secret", 
  password='secret', 
  port="5439")
```
Single-threaded mode
```
achilles(connectionDetails, 
  cdmDatabaseSchema = "cdm5_inst", 
  resultsDatabaseSchema="results",
  vocabDatabaseSchema = "vocab",
  numThreads = 1,
  sourceName = "My Source Name", 
  cdmVersion = "5.3.0",
  runHeel = TRUE,
  runCostAnalysis = TRUE)
```
Multi-threaded mode
```
achilles(connectionDetails, 
  cdmDatabaseSchema = "cdm5_inst", 
  resultsDatabaseSchema = "results",
  scratchDatabaseSchema = "scratch",
  vocabDatabaseSchema = "vocab",
  numThreads = 10,
  sourceName = "My Source Name", 
  cdmVersion = "5.3.0",
  runHeel = TRUE,
  runCostAnalysis = TRUE)
```
The "cdm5_inst" cdmDatabaseSchema parameter, "results" resultsDatabaseSchema parameter, and "scratch" scratchDatabaseSchema parameter are the fully qualified names of the schemas holding the CDM data, targeted for result writing, and holding the intermediate scratch tables, respectively. See the DatabaseConnector package for details on settings the connection details for your database, for example by typing
```
?createConnectionDetails
```
Execution of all Achilles pre-computations may take a long time, particularly in single-threaded mode and with COST analyses enabled. See <extras/notes.md> file to find out how some analyses can be excluded to make the execution faster (excluding cost pre-computations)

Currently "sql server", "pdw", "oracle", "postgresql", "redshift", "mysql", "impala", and "bigquery" are supported as dbms. cdmVersion can be ONLY 5.x (please look at prior commit history for v4 support).

To use AchillesWeb to explore the Achilles statistics, you must first export the statistics to a folder JSON files, which can optionally be compressed into one gzipped file for easier transportability.

exportToJson(connectionDetails, 
  cdmDatabaseSchema = "cdm5_inst", 
  resultsDatabaseSchema = "results", 
  outputPath = "c:/myPath/AchillesExport", 
  cdmVersion = "5.3.0",
  compressIntoOneFile = TRUE # creates gzipped file of all JSON files)

To run only Achilles Heel (component of Achilles), use the following command:

achillesHeel(connectionDetails, 
  cdmDatabaseSchema = "cdm5_inst", 
  resultsDatabaseSchema = "results", 
  scratchDatabaseSchema = "scratch",
  numThreads = 10, # multi-threaded mode
  cdmVersion = "5.3.0")

Possible optional additional steps:
- To see what errors were found (from within R), run:
```
fetchAchillesHeelResults(connectionDetails,resultsDatabaseSchema)
```
- To see a particular analysis, run:
```
fetchAchillesAnalysisResults(connectionDetails,resultsDatabaseSchema,analysisId = 2)
```
- To join data tables with some lookup (overview files), obtain those using commands below:
- To get description of analyses, run getAnalysisDetails().
- To get description of derived measures, run:
```
read.csv(system.file("csv","derived_analysis_details",package="Achilles"),as.is=T)
```
- Similarly, for overview of rules, run:
```
read.csv(system.file("csv","achilles_rule.csv",package="Achilles"),as.is=T)
```
- Also see notes.md for more information (in the extras folder).

Developers: How to Add or Modify Analyses

Please refer to the README-developers.md file.

Getting Started with Docker

This is an alternative method for running Achilles that does not require R and Java installations, using a Docker container instead.

Install Docker and Docker Compose.
Clone this repository with git (git clone https://github.com/OHDSI/Achilles.git) and make it your working directory (cd Achilles).
Copy env_vars.sample to env_vars and fill in the variable definitions. The ACHILLES_DB_URI should be formatted as <dbms>://<username>:<password>@<host>/<schema>.
Copy docker-compose.yml.sample to docker-compose.yml and fill in the data output directory.
Build the docker image with docker-compose build.
Run Achilles in the background with docker-compose run -d achilles.

Alternatively, you can run it with one long command line, like in the following example:

docker run \
  --rm \
  --net=host \
  -v "$(pwd)"/output:/opt/app/output \
  -e ACHILLES_SOURCE=DEFAULT \
  -e ACHILLES_DB_URI=postgresql://webapi:webapi@localhost:5432/ohdsi \
  -e ACHILLES_CDM_SCHEMA=cdm5 \
  -e ACHILLES_VOCAB_SCHEMA=cdm5 \
  -e ACHILLES_RES_SCHEMA=webapi \
  -e ACHILLES_CDM_VERSION=5 \
  <image name>

License

Achilles is licensed under Apache License 2.0

Pre-computations

Achilles has some compatibility with Data Quality initiatives of the Data Quality Collaborative (DQC; http://repository.edm-forum.org/dqc or GitHub https://github.com/orgs/DQCollaborative). For example, a harmonized set of data quality terms has been published by Khan at al. in 2016.

What Achilles calls an analysis (a pre-computation for a given dataset), the term used by DQC would be measure.

Some Heel Rules take advantage of derived measures. A feature of Heel introduced since version 1.4. A derived measure is a result of an SQL query that takes Achilles analyses as input. It is simply a different view of the precomputations that has some advantage to be materialized. The logic for computing a derived measures can be viewed in the Heel SQL files in /inst/sql/sql_server/heels, which are described further in the Developers README file.

Overview of derived measures can be seen in CSV file here.

For possible future flexible setting of Achilles Heel rule thresholds, some Heel rules are split into two phase approach. First, a derived measure is computed and the result is stored in a separate table ACHILLES_RESULTS_DERIVED. A Heel rule logic is than made simpler by a simple comparison whether a derived measure is over a threshold. A link between which rules use which pre-computation is available in CSV file here (previously was in inst/csv/achilles_rule.csv) (see column linked_measure).

Heel Rules

Rules are classified into CDM conformance rules and DQ rules - see column rule_type in the CSV file here.

Some Heel rules can be generalized to non-OMOP datasets. Other rules are dependant on OMOP concept ids and a translation of the code to other CDMs would be needed (for example rule with rule_id of 29 uses OMOP specific concept;concept 195075).

Rules that have in their name a prefix [GeneralPopulationOnly] are applicable to datasets that represent a general population. Once metadata for this parameter is implemented by OHDSI, their execution can be limited to such datasets. In the meantime, users should ignore output of rules that are meant for general population if their dataset is not of that type.

Rules are classified into: error, warning and notification (see column severity).

Acknowledgements

This project is supported in part through the National Science Foundation grant IIS 1251151.

Name		Name	Last commit message	Last commit date
Latest commit History 363 Commits
.github		.github
AchillesHeelOutput		AchillesHeelOutput
R		R
extras		extras
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
.travis.yml		.travis.yml
Achilles.Rproj		Achilles.Rproj
Connect_SQLServer.R		Connect_SQLServer.R
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
NAMESPACE		NAMESPACE
README-developers.md		README-developers.md
README-impala.md		README-impala.md
README.md		README.md
deploy.sh		deploy.sh
docker-compose.yml.sample		docker-compose.yml.sample
docker-run		docker-run
env_vars.sample		env_vars.sample
sqljdbc4.jar		sqljdbc4.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Achilles

Getting Started

Developers: How to Add or Modify Analyses

Getting Started with Docker

License

Pre-computations

Heel Rules

Acknowledgements

About

Releases

Packages

Languages

trberg/Achilles

Folders and files

Latest commit

History

Repository files navigation

Achilles

Getting Started

Developers: How to Add or Modify Analyses

Getting Started with Docker

License

Pre-computations

Heel Rules

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages