-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EHR retrieval and query with GenAI models #504
Conversation
f054d45
to
e6ed0ca
Compare
0d64fbe
to
19f41aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To all involved: awesome work here! 🚀
Took an initial pass focused on FHIR, will take another pass soon to review LMM workflow.
Given the scope of these changes, it would be good to have some level of CTest unit test coverage to integrate with nightly dashboards. See Endoscopy Tool Tracking CMakeLists.txt for an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a best-effort pass through the lmm
agent folder, looking good! Left some comments, mostly minor edits or fixes.
There are a few large JSON files in this PR. Would it be possible to reduce those contents to <1MB, provide generation instructions, or even distribute them ourselves rather than adding to commit history? |
We should put the data on NGC. |
|
It defeats the purpose if we let in the files on first merge, and we had made the decision. |
thank you for the reply @MMelQin . I was not aware a decision has been made one way or the other as I saw no comment addressing this issue in the PR nor any other communication regarding this in Slack/e-mail. RE: "The JSON files used for locally populating vector DB is not really needed (used as a safety net for the GTC demo)," are you referring to ehr_data.json file?If so, that is not precisely a safety net, that is the result of running the script create_ehr_database_local and have it as part of the PR. |
Hate to say it in this venue, but to prevent misconception, @cdinea you are wrong here. Also, I'd strongly suggest use separate channel to discuss and sync up on the details. |
thanks @MMelQin . Indeed I reread my comment and would like to rectify, sent it too soon :) - get_ehr_data is needed in the create_ehr_db_local() to create the local database `# Path to the EHR data as input Path to the downloaded fine tuned modelEHR_FINETUNED_MODEL = "/workspace/volumes/models/bge-large-ehr-finetune" Persistent storage folder for the Vector DBPERSISTENT_FOLDER = "/workspace/holohub/applications/ehr_query_llm/lmm/rag/ehr/db" def get_ehr_data(): |
Will consider adding this in the next phase of this example app, at least for the operators. |
As explained and discussed above, the couple large JSON files were removed from this PR as they were indeed unnecessary. Large JSON file for FHIR Sanitizer module testing can be downloaded from public source (added instructions in code with this commit), and the other one used by the EHR Builder Agent was for a edge case for a demo, but the code was erroneously copied over; fixed with this commit. |
Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
…g full test data Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
…mments in LMM modules. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
…sted. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
fixed typos in README Signed-off-by: Cristiana Dinea <[email protected]>
modified args field name Signed-off-by: Cristiana Dinea <[email protected]>
fixed typo Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
Signed-off-by: Cristiana Dinea <[email protected]>
0f05a03
to
069ac11
Compare
* Initial checkin with EHR FHIR client and 0MQ operators Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Added dependencies and their licenses in th metadata file Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Linting fix Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add the RAG folder for the Vector DB creation and testing Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add CMake file and Dockerfile and draft WIP Readme Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add code and instructions on dev testing the app, as well as packaging for distribution and deployment Noted that codespell complains about the acronym EHR, will look into exclusion list to address it. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add ref to the top level HoloHub Readme. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * HoloHub check required metatdata in a intermediate parent folder Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Added metadata file to the operators/ehr_query_llm Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Correct help text Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * doc update Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * add metadata Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * webapp changes Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * asr tts and base agent Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * agents soruce code Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * selector agent yaml Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * ehr agents yaml files Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * dockerfile and run script Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * minimal README Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Fix linting errors and missing metadata in the LLM code, to pass CI checks Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Completes linting error fixes, for now Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Replace "holoscrub" with "ehr_query_llm" in module and file paths Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Fix isort complaint Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * fixed requuirements file Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * remove chat agent Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add back the docker file for FHIR Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * change base image Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * remove pytroch as base Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * remove chat agent Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * updated requirements Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Fix the issue of LMM docker build by adding holoscan while avoiding pkg conflicts Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Resove riva.client audio_io import issue Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Fix the langchain use -> langchain.community Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add a print statement before starting the server Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Correct mistake in HSDK reqs file, and force holoscan>=2.5 Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * small README fix Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * changes to fix void audio transcription Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * commit index.html Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Add app suite README along with ingored words for codespell Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Correct links Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Editorial updates, mostly correcting typos that were missed by codespell Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Updated folders, CMakefile, and packager to work with updated dev_container script. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Addressed review commnents and re-tested all modes of running following the README - corrected formatting, replaced CMAKE_HOME_DIRECTORY with CMAKE_SOURCE_DIR, added comments for working around HSDK CLI limitation - typo missed by codespell, applicaition -> application - removed a note on running Python app using the run script without being in the dev container. Less is more - stated testing with HSDK v2.5 - removed the run_cmd_template.sh which is not needed anymore as the README covers all - and more Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Fixed linting error in the last commit, complaint from black Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Added test data file and its path parsing, and corrected typo in fn name Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Rereshed HSDK version and removed the now useless port defaults Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Replaced test data with example only, and provided URL for downloading full test data Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Addresses PR comments, e.g. large file, commented code, incomplete comments in LMM modules. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * add link for BGE Signed-off-by: Cristiana Dinea <[email protected]> * removed commented out code from index Signed-off-by: Cristiana Dinea <[email protected]> * remove commented out code Signed-off-by: Cristiana Dinea <[email protected]> * add check fo holoscan install Signed-off-by: Cristiana Dinea <[email protected]> * remove commented out code Signed-off-by: Cristiana Dinea <[email protected]> * remove commented out code Signed-off-by: Cristiana Dinea <[email protected]> * Fixed EHR Agent using local file and addressed code commnents. E2E tested. Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * change copyright year Signed-off-by: Cristiana Dinea <[email protected]> * Rebased to HSDK v2.7, fixed missed typo, and E2E tested Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * add eagent config example and more details on zeromq Signed-off-by: Cristiana Dinea <[email protected]> * Signed-off-by: Cristiana Dinea <[email protected]> fixed typos in README Signed-off-by: Cristiana Dinea <[email protected]> * Signed-off-by: Cristiana Dinea <[email protected]> modified args field name Signed-off-by: Cristiana Dinea <[email protected]> * Signed-off-by: Cristiana Dinea <[email protected]> fixed typo Signed-off-by: Cristiana Dinea <[email protected]> * Added rationale on host Riva out of of dev container Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * Remove redundant paragraph Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> * add agent framework description Signed-off-by: Cristiana Dinea <[email protected]> * fix typo Signed-off-by: Cristiana Dinea <[email protected]> * moved agent description Signed-off-by: Cristiana Dinea <[email protected]> * add sed commands Signed-off-by: Cristiana Dinea <[email protected]> * delete png Signed-off-by: Cristiana Dinea <[email protected]> * address feddback Signed-off-by: Cristiana Dinea <[email protected]> --------- Signed-off-by: M Q <[email protected]> Signed-off-by: Cristiana Dinea <[email protected]> Co-authored-by: Cristiana Dinea <[email protected]>
This application presents a framework including multiple component services to integrate with FHIR service to retrieve patient EHR and a RAG pipeline to generate insights with the use of Embedding and LLM models.
The FHIR application can be run multiple ways:
The RAG pipeline application is only tested for running in HoloHub app container with the help of a start-up shell script, started manually. The whole process can potentially be automated with the dev_container build and run CLI command with the start-up script being the entrypoint of the app container.