A de-identification toolkit for clinical text in Hebrew.
HebSafeHarbor was developed according to the requirements described in הצעה למתווה התממה של טקסט רפואי - נמל מבטחים (read more here)
The toolkit integrates and uses open source libraries and assets, including HebSpacy (that runs NER model based on AlephBERT, Nemo and BMC), Presidio, Wikipedia and public lexicons.
📝 Demo website is available here
To install the package, run the following commands - preferably in a virtual environment
# Create conda env (optional)
conda create --name hebsafeharbor python=3.8
conda activate hebsafeharbor
# Install HebSafeHarbor
pip install hebsafeharbor
# Download the he_ner_news_trf model used by hebsafeharbor
pip install https://github.com/8400TheHealthNetwork/HebSpacy/releases/download/he_ner_news_trf-3.2.1/he_ner_news_trf-3.2.1-py3-none-any.whl
Alternatively, you may clone the repo and install all dependencies:
# Create conda env (optional)
conda create --name hebsafeharbor python=3.8
conda activate hebsafeharbor
# Install dependencies
pip install -r requirements.txt
# (Optional) Install package locally
pip install -e .
# Download the he_ner_news_trf model used by hebsafeharbor
pip install https://github.com/8400TheHealthNetwork/HebSpacy/releases/download/he_ner_news_trf-3.2.1/he_ner_news_trf-3.2.1-py3-none-any.whl
from hebsafeharbor import HebSafeHarbor
hsh = HebSafeHarbor()
text = """שרון לוי התאשפזה ב02.02.2012 וגרה בארלוזרוב 16 רמת גן"""
doc = {"text": text}
output = hsh([doc])
print(output[0].anonymized_text.text)
# > <שם_> התאשפזה ב<יום_>.02.2012 וגרה <מיקום_> 16 רמת גן
The easiest way to consume HebSafeHarbor as a service with a REST API and demo application is through docker-compose
setup.
Run the docker-compose
command against the docker-compose.yml
file in the root directory to get the latest containers from Docker Hub
docker-compose up -d --build
Navigate in the browser to http://server.localhost/docs to access the service swagger. Navigate in the browser to http://demo.localhost to test the demo application.
To run the containers against the repo's code, run the following command:
docker-compose -f docker-compose-development.yml up -d --build
HebSafeHarbor can be consumed as a REST endpoint for the service powered by the FastAPI library. To run the service locally, you will need to first install the environment requirements as follows:
# Create conda env (optional)
conda create --name hebsafeharbor-server python=3.8
conda activate hebsafeharbor-server
# Install dependencies
pip install -r requirements.txt
# Install server dependencies
pip install -r requirements-server.txt
Once installation is done, run the following command to initiate the service:
uvicorn server:app --reload
The service should be running at http://127.0.0.1:8000. Navigate to this URL and make sure you see the following message:
{"message":"Welcome to the Hebrew Safe Harbor!"}
Now go to http://127.0.0.1:8000/docs to see interactive API documentation (Swagger).
Alternatively, you can query the service directly by send POST requests to http://127.0.0.1:8000/query with the payload as described in the API documentation.
To download and run the official release as a Docker container, run the following commands:
# Download image
docker pull hebsafeharbor/hebsafeharbor
# Run container
docker run --name hsh_server -d -p 8000:8000 hebsafeharbor/hebsafeharbor:latest
Alternatively, you can build and run the Docker container:
# Build image
docker build ./ --tag hebsafeharbor_server
# Run container
docker run --name hsh_server -d -p 8000:8000 hebsafeharbor_server
Navigate to http://localhost:8000 to validate the service is up and running. Similarly, you can go to http://localhost:8000/docs and http://localhost:8000/query.
For experimentation and testing purposes, we provide a Streamlit demo application. Note that the demo application functions as a (semi-)web client in the sense that it queries the remote REST endpoint and visualizes the results.
In order for the demo to work and interact with the server application, you will need to set HSH_SERVER
environment variable to the REST point URL.
For example, assuming you are running the server as docker container:
[For Windows]
$env:HSH_SERVER="http://localhost:8000"
[For Linux]
export HSH_SERVER=http://localhost:8000
To run the demo locally, first install the requirements as follows:
# Create conda env (optional)
conda create --name hebsafeharbor_demo python=3.8
conda activate hebsafeharbor_demo
# Install dependencies
pip install -r demo/requirements-streamlit.txt
Once installation is done, run the following command:
streamlit run demo/demo_app.py
Now go to http://localhost:8501 to access the application
To download the official Docker container, run the following commands:
# Download image
docker pull hebsafeharbor/demo_application
# Run container - replace <SERVICE URL> with server url (e.g.,docker run -d -e HSH_SERVER=http://172.17.0.2:8000 -p 8501:8501 hebsafeharbor/demo_application:latest)
docker run --name hsh_demo -d -e "HSH_SERVER=<SERVICE URL>" -p 8501:8501 hebsafeharbor/demo_application:latest
Alternatively, you can build and run the Docker container:
# Build image
docker build --name hsh_demo demo/. --tag hebsafeharbor_demo
# Run container - replace <SERVICE URL> with server url (e.g.,docker run -d -e HSH_SERVER=http://172.17.0.2:8000 -p 8501:8501 hebsafeharbor_demo)
docker run -d -e "HSH_SERVER=<SERVICE URL>" -p 8501:8501 hebsafeharbor_demo
Navigate to http://localhost:8501 to access the demo application.
Note that the server container includes a trsnssformer model that consume more than 2gb of memory, increase Docker memory allocation to 6gb
Read more here.
Run the network inspect bridge
command to get networking information regarding running dockers
docker network inspect bridge
Your response should look like the following:
[
{
"Name": "bridge",
...
"Containers": {
"9611cad28701cfe0877c2bfed9ad2710202492de6d574d42a6714f439cf4f2d2": {
"Name": "hsh_server",
...
"IPv4Address": "172.17.0.2/16",
...
}
},
...
}
]
The IP address is under IPv4Address
associated with "Name": "hsh_server"
container under Containers
.
In our case, HSH_SERVER = 172.17.0.2
Read more here
NLP capabilities are based on resources developedg by ONLP Lab (the lab git). Especialy AlephBERT and NEMO.
HebSafeHarbor is an open-source project developed by 8400 The Health Network.