NLP-embedding-visualisation

This repo contains supplementary interactive visualisations for the paper

The visualisations are available here: https://khp-informatics.github.io/NLP-embedding-visualisation/

The preprint paper is available here: https://www.medrxiv.org/content/10.1101/2022.09.15.22279981v2

Available visualisations

Patient embedding

Clustering of patients based on SNOMED disorder codes detected in free text. A sample of 100,000 patients was embedded based on normalised annotation counts for all SNOMED disorder codes detected in at least 1000 patients at King's College Hospital. These vectors were reduced to 50 dimensions using PCA then to 2 dimensions using t-SNE. Colour indicates cluster membership (50 clusters) assigned by agglomerative clustering with Ward linkage.

The prevalence of SNOMED codes is calculated for each cluster and the count of each code is propagated up the SNOMED ontology to all parents. The following SNOMED codes are then removed as they are uninformative (most have 100% prevalence in all clusters as they are high level parent codes): 138875005, 64572001, 301857004, 123946008, 118234003, 404684003, 362965005. When a cluster is selected, up to 5 codes are shown. These are the most prevalent codes that are relevant to at least 50% of the patients in the cluster.

For performance reasons, this visualisation is further subsampled to 20% of the original data, stratified by cluster.

Treemap - top 100 concepts and "others" (v1)

LINK

Hover over a concept to see the concept name and total annotation (the number of times the concept was detected, not the number of patients). Click a type (finding, disorder, substance) to expand those annotations. Click "all" to return to the overview. Only the top 100 concepts per type are shown, all remaining concepts per type are merged into the "other" group, and the number in brackets indicates the number of merged concepts.

Treemap - top 100 concepts only (v2)

LINK

Hover over a concept to see the concept name and total annotation (the number of times the concept was detected, not the number of patients). Click a type (finding, disorder, substance) to expand those annotations. Click "all" to return to the overview. Only the top 100 concepts per type are shown.

Authors

Daniel Bean, Zeljko Kraljevic, Anthony Shek, James Teo, Richard Dobson

Cite

The preprint paper is available here: https://www.medrxiv.org/content/10.1101/2022.09.15.22279981v2

Hospital-wide Natural Language Processing summarising the health data of 1 million patients

Daniel Bean, Zeljko Kraljevic, Anthony Shek, James Teo, Richard Dobson

medRxiv 2022.09.15.22279981; doi: https://doi.org/10.1101/2022.09.15.22279981

Acknowledgements

This work uses data provided by patients and collected by the NHS as part of their care and support. We would like to thank the patients on the Kings Electronic Records Research Interface (KERRI), the NIHR Applied Research Centre South London, the NIHR Maudsley Biomedical Research Centre, the London AI Centre for Value-based Healthcare, the NHS AI Lab and Health Data Research (UK).

Funding

The project has received funding support from Innovate UK, NHS AI Lab, Office of Life Sciences, Health Data Research UK, NIHR Maudsley Biomedical Research Centre and NIHR Applied Research Centre South London.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
imgs		imgs
README.md		README.md
cluster_info.js		cluster_info.js
clustering.ipynb		clustering.ipynb
data_balanced_sample.csv		data_balanced_sample.csv
embedding.html		embedding.html
index.html		index.html
treemap.html		treemap.html
treemap_no-other.html		treemap_no-other.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-embedding-visualisation

Available visualisations

Patient embedding

Treemap - top 100 concepts and "others" (v1)

Treemap - top 100 concepts only (v2)

Authors

Cite

Acknowledgements

Funding

About

Releases

Packages

Languages

dbeanm/NLP-embedding-visualisation

Folders and files

Latest commit

History

Repository files navigation

NLP-embedding-visualisation

Available visualisations

Patient embedding

Treemap - top 100 concepts and "others" (v1)

Treemap - top 100 concepts only (v2)

Authors

Cite

Acknowledgements

Funding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages