-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into mvc-legos
- Loading branch information
Showing
117 changed files
with
111,886 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Build an LLM RAG Chatbot With LangChain | ||
|
||
This repo contains the source code for [Build an LLM RAG Chatbot With LangChain](https://realpython.com/build-llm-rag-chatbot-with-langchain/) | ||
|
||
To run the final application that you'll build in this tutorial, you can use the code provided in `source_code_final/`. | ||
|
||
## Setup | ||
|
||
Create a `.env` file in the root directory and add the following environment variables: | ||
|
||
```.env | ||
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> | ||
NEO4J_URI=<YOUR_NEO4J_URI> | ||
NEO4J_USERNAME=<YOUR_NEO4J_USERNAME> | ||
NEO4J_PASSWORD=<YOUR_NEO4J_PASSWORD> | ||
HOSPITALS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/hospitals.csv | ||
PAYERS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/payers.csv | ||
PHYSICIANS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/physicians.csv | ||
PATIENTS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/patients.csv | ||
VISITS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/visits.csv | ||
REVIEWS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/reviews.csv | ||
HOSPITAL_AGENT_MODEL=gpt-3.5-turbo-1106 | ||
HOSPITAL_CYPHER_MODEL=gpt-3.5-turbo-1106 | ||
HOSPITAL_QA_MODEL=gpt-3.5-turbo-0125 | ||
CHATBOT_URL=http://host.docker.internal:8000/hospital-rag-agent | ||
``` | ||
|
||
The chatbot uses OpenAI LLMs, so you'll need to create an [OpenAI API key](https://realpython.com/generate-images-with-dalle-openai-api/#get-your-openai-api-key) and store it as `OPENAI_API_KEY`. | ||
|
||
The three `NEO4J_` variables are used to connect to your Neo4j AuraDB instance. Follow the directions [here](https://neo4j.com/cloud/platform/aura-graph-database/?ref=docs-nav-get-started) to create a free instance. | ||
|
||
Once you have a running Neo4j instance, and have filled out all the environment variables in `.env`, you can run the entire project with [Docker Compose](https://docs.docker.com/compose/). You can install Docker Compose by following [these directions](https://docs.docker.com/compose/install/). | ||
|
||
Once you've filled in all of the environment variables, set up a Neo4j AuraDB instance, and installed Docker Compose, open a terminal and run: | ||
|
||
```console | ||
$ docker-compose up --build | ||
``` | ||
|
||
After each container finishes building, you'll be able to access the chatbot API at `http://localhost:8000/docs` and the Streamlit app at `http://localhost:8501/`. |
12 changes: 12 additions & 0 deletions
12
langchain-rag-app/source_code_final/chatbot_api/Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# chatbot_api/Dockerfile | ||
|
||
FROM python:3.11-slim | ||
|
||
WORKDIR /app | ||
COPY ./src/ /app | ||
|
||
COPY ./pyproject.toml /code/pyproject.toml | ||
RUN pip install /code/. | ||
|
||
EXPOSE 8000 | ||
CMD ["sh", "entrypoint.sh"] |
19 changes: 19 additions & 0 deletions
19
langchain-rag-app/source_code_final/chatbot_api/pyproject.toml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
[project] | ||
name = "chatbot_api" | ||
version = "0.1" | ||
dependencies = [ | ||
"asyncio==3.4.3", | ||
"fastapi==0.109.0", | ||
"langchain==0.1.0", | ||
"langchain-openai==0.0.2", | ||
"langchainhub==0.1.14", | ||
"neo4j==5.14.1", | ||
"numpy==1.26.2", | ||
"openai==1.7.2", | ||
"opentelemetry-api==1.22.0", | ||
"pydantic==2.5.1", | ||
"uvicorn==0.25.0" | ||
] | ||
|
||
[project.optional-dependencies] | ||
dev = ["black", "flake8"] |
82 changes: 82 additions & 0 deletions
82
langchain-rag-app/source_code_final/chatbot_api/src/agents/hospital_rag_agent.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
import os | ||
|
||
from chains.hospital_cypher_chain import hospital_cypher_chain | ||
from chains.hospital_review_chain import reviews_vector_chain | ||
from langchain import hub | ||
from langchain.agents import AgentExecutor, Tool, create_openai_functions_agent | ||
from langchain_openai import ChatOpenAI | ||
from tools.wait_times import ( | ||
get_current_wait_times, | ||
get_most_available_hospital, | ||
) | ||
|
||
HOSPITAL_AGENT_MODEL = os.getenv("HOSPITAL_AGENT_MODEL") | ||
|
||
hospital_agent_prompt = hub.pull("hwchase17/openai-functions-agent") | ||
|
||
tools = [ | ||
Tool( | ||
name="Experiences", | ||
func=reviews_vector_chain.invoke, | ||
description="""Useful when you need to answer questions | ||
about patient experiences, feelings, or any other qualitative | ||
question that could be answered about a patient using semantic | ||
search. Not useful for answering objective questions that involve | ||
counting, percentages, aggregations, or listing facts. Use the | ||
entire prompt as input to the tool. For instance, if the prompt is | ||
"Are patients satisfied with their care?", the input should be | ||
"Are patients satisfied with their care?". | ||
""", | ||
), | ||
Tool( | ||
name="Graph", | ||
func=hospital_cypher_chain.invoke, | ||
description="""Useful for answering questions about patients, | ||
physicians, hospitals, insurance payers, patient review | ||
statistics, and hospital visit details. Use the entire prompt as | ||
input to the tool. For instance, if the prompt is "How many visits | ||
have there been?", the input should be "How many visits have | ||
there been?". | ||
""", | ||
), | ||
Tool( | ||
name="Waits", | ||
func=get_current_wait_times, | ||
description="""Use when asked about current wait times | ||
at a specific hospital. This tool can only get the current | ||
wait time at a hospital and does not have any information about | ||
aggregate or historical wait times. Do not pass the word "hospital" | ||
as input, only the hospital name itself. For example, if the prompt | ||
is "What is the current wait time at Jordan Inc Hospital?", the | ||
input should be "Jordan Inc". | ||
""", | ||
), | ||
Tool( | ||
name="Availability", | ||
func=get_most_available_hospital, | ||
description=""" | ||
Use when you need to find out which hospital has the shortest | ||
wait time. This tool does not have any information about aggregate | ||
or historical wait times. This tool returns a dictionary with the | ||
hospital name as the key and the wait time in minutes as the value. | ||
""", | ||
), | ||
] | ||
|
||
chat_model = ChatOpenAI( | ||
model=HOSPITAL_AGENT_MODEL, | ||
temperature=0, | ||
) | ||
|
||
hospital_rag_agent = create_openai_functions_agent( | ||
llm=chat_model, | ||
prompt=hospital_agent_prompt, | ||
tools=tools, | ||
) | ||
|
||
hospital_rag_agent_executor = AgentExecutor( | ||
agent=hospital_rag_agent, | ||
tools=tools, | ||
return_intermediate_steps=True, | ||
verbose=True, | ||
) |
157 changes: 157 additions & 0 deletions
157
langchain-rag-app/source_code_final/chatbot_api/src/chains/hospital_cypher_chain.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
import os | ||
|
||
from langchain.chains import GraphCypherQAChain | ||
from langchain.prompts import PromptTemplate | ||
from langchain_community.graphs import Neo4jGraph | ||
from langchain_openai import ChatOpenAI | ||
|
||
HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL") | ||
HOSPITAL_CYPHER_MODEL = os.getenv("HOSPITAL_CYPHER_MODEL") | ||
|
||
graph = Neo4jGraph( | ||
url=os.getenv("NEO4J_URI"), | ||
username=os.getenv("NEO4J_USERNAME"), | ||
password=os.getenv("NEO4J_PASSWORD"), | ||
) | ||
|
||
graph.refresh_schema() | ||
|
||
cypher_generation_template = """ | ||
Task: | ||
Generate Cypher query for a Neo4j graph database. | ||
Instructions: | ||
Use only the provided relationship types and properties in the schema. | ||
Do not use any other relationship types or properties that are not provided. | ||
Schema: | ||
{schema} | ||
Note: | ||
Do not include any explanations or apologies in your responses. | ||
Do not respond to any questions that might ask anything other than | ||
for you to construct a Cypher statement. Do not include any text except | ||
the generated Cypher statement. Make sure the direction of the relationship is | ||
correct in your queries. Make sure you alias both entities and relationships | ||
properly. Do not run any queries that would add to or delete from | ||
the database. Make sure to alias all statements that follow as with | ||
statement (e.g. WITH v as visit, c.billing_amount as billing_amount) | ||
If you need to divide numbers, make sure to | ||
filter the denominator to be non zero. | ||
Examples: | ||
# Who is the oldest patient and how old are they? | ||
MATCH (p:Patient) | ||
RETURN p.name AS oldest_patient, | ||
duration.between(date(p.dob), date()).years AS age | ||
ORDER BY age DESC | ||
LIMIT 1 | ||
# Which physician has billed the least to Cigna | ||
MATCH (p:Payer)<-[c:COVERED_BY]-(v:Visit)-[t:TREATS]-(phy:Physician) | ||
WHERE p.name = 'Cigna' | ||
RETURN phy.name AS physician_name, SUM(c.billing_amount) AS total_billed | ||
ORDER BY total_billed | ||
LIMIT 1 | ||
# Which state had the largest percent increase in Cigna visits | ||
# from 2022 to 2023? | ||
MATCH (h:Hospital)<-[:AT]-(v:Visit)-[:COVERED_BY]->(p:Payer) | ||
WHERE p.name = 'Cigna' AND v.admission_date >= '2022-01-01' AND | ||
v.admission_date < '2024-01-01' | ||
WITH h.state_name AS state, COUNT(v) AS visit_count, | ||
SUM(CASE WHEN v.admission_date >= '2022-01-01' AND | ||
v.admission_date < '2023-01-01' THEN 1 ELSE 0 END) AS count_2022, | ||
SUM(CASE WHEN v.admission_date >= '2023-01-01' AND | ||
v.admission_date < '2024-01-01' THEN 1 ELSE 0 END) AS count_2023 | ||
WITH state, visit_count, count_2022, count_2023, | ||
(toFloat(count_2023) - toFloat(count_2022)) / toFloat(count_2022) * 100 | ||
AS percent_increase | ||
RETURN state, percent_increase | ||
ORDER BY percent_increase DESC | ||
LIMIT 1 | ||
# How many non-emergency patients in North Carolina have written reviews? | ||
match (r:Review)<-[:WRITES]-(v:Visit)-[:AT]->(h:Hospital) | ||
where h.state_name = 'NC' and v.admission_type <> 'Emergency' | ||
return count(*) | ||
String category values: | ||
Test results are one of: 'Inconclusive', 'Normal', 'Abnormal' | ||
Visit statuses are one of: 'OPEN', 'DISCHARGED' | ||
Admission Types are one of: 'Elective', 'Emergency', 'Urgent' | ||
Payer names are one of: 'Cigna', 'Blue Cross', 'UnitedHealthcare', 'Medicare', | ||
'Aetna' | ||
A visit is considered open if its status is 'OPEN' and the discharge date is | ||
missing. | ||
Use abbreviations when | ||
filtering on hospital states (e.g. "Texas" is "TX", | ||
"Colorado" is "CO", "North Carolina" is "NC", | ||
"Florida" is "FL", "Georgia" is "GA, etc.) | ||
Make sure to use IS NULL or IS NOT NULL when analyzing missing properties. | ||
Never return embedding properties in your queries. You must never include the | ||
statement "GROUP BY" in your query. Make sure to alias all statements that | ||
follow as with statement (e.g. WITH v as visit, c.billing_amount as | ||
billing_amount) | ||
If you need to divide numbers, make sure to filter the denominator to be non | ||
zero. | ||
The question is: | ||
{question} | ||
""" | ||
|
||
cypher_generation_prompt = PromptTemplate( | ||
input_variables=["schema", "question"], template=cypher_generation_template | ||
) | ||
|
||
qa_generation_template = """You are an assistant that takes the results | ||
from a Neo4j Cypher query and forms a human-readable response. The | ||
query results section contains the results of a Cypher query that was | ||
generated based on a users natural language question. The provided | ||
information is authoritative, you must never doubt it or try to use | ||
your internal knowledge to correct it. Make the answer sound like a | ||
response to the question. | ||
Query Results: | ||
{context} | ||
Question: | ||
{question} | ||
If the provided information is empty, say you don't know the answer. | ||
Empty information looks like this: [] | ||
If the information is not empty, you must provide an answer using the | ||
results. If the question involves a time duration, assume the query | ||
results are in units of days unless otherwise specified. | ||
When names are provided in the query results, such as hospital names, | ||
beware of any names that have commas or other punctuation in them. | ||
For instance, 'Jones, Brown and Murray' is a single hospital name, | ||
not multiple hospitals. Make sure you return any list of names in | ||
a way that isn't ambiguous and allows someone to tell what the full | ||
names are. | ||
Never say you don't have the right information if there is data in | ||
the query results. Make sure to show all the relevant query results | ||
if you're asked. | ||
Helpful Answer: | ||
""" | ||
|
||
qa_generation_prompt = PromptTemplate( | ||
input_variables=["context", "question"], template=qa_generation_template | ||
) | ||
|
||
hospital_cypher_chain = GraphCypherQAChain.from_llm( | ||
cypher_llm=ChatOpenAI(model=HOSPITAL_CYPHER_MODEL, temperature=0), | ||
qa_llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0), | ||
graph=graph, | ||
verbose=True, | ||
qa_prompt=qa_generation_prompt, | ||
cypher_prompt=cypher_generation_prompt, | ||
validate_cypher=True, | ||
top_k=100, | ||
) |
60 changes: 60 additions & 0 deletions
60
langchain-rag-app/source_code_final/chatbot_api/src/chains/hospital_review_chain.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
import os | ||
|
||
from langchain.chains import RetrievalQA | ||
from langchain.prompts import ( | ||
ChatPromptTemplate, | ||
HumanMessagePromptTemplate, | ||
PromptTemplate, | ||
SystemMessagePromptTemplate, | ||
) | ||
from langchain.vectorstores.neo4j_vector import Neo4jVector | ||
from langchain_openai import ChatOpenAI, OpenAIEmbeddings | ||
|
||
HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL") | ||
|
||
neo4j_vector_index = Neo4jVector.from_existing_graph( | ||
embedding=OpenAIEmbeddings(), | ||
url=os.getenv("NEO4J_URI"), | ||
username=os.getenv("NEO4J_USERNAME"), | ||
password=os.getenv("NEO4J_PASSWORD"), | ||
index_name="reviews", | ||
node_label="Review", | ||
text_node_properties=[ | ||
"physician_name", | ||
"patient_name", | ||
"text", | ||
"hospital_name", | ||
], | ||
embedding_node_property="embedding", | ||
) | ||
|
||
review_template = """Your job is to use patient | ||
reviews to answer questions about their experience at | ||
a hospital. Use the following context to answer questions. | ||
Be as detailed as possible, but don't make up any information | ||
that's not from the context. If you don't know an answer, | ||
say you don't know. | ||
{context} | ||
""" | ||
|
||
review_system_prompt = SystemMessagePromptTemplate( | ||
prompt=PromptTemplate( | ||
input_variables=["context"], template=review_template | ||
) | ||
) | ||
|
||
review_human_prompt = HumanMessagePromptTemplate( | ||
prompt=PromptTemplate(input_variables=["question"], template="{question}") | ||
) | ||
messages = [review_system_prompt, review_human_prompt] | ||
|
||
review_prompt = ChatPromptTemplate( | ||
input_variables=["context", "question"], messages=messages | ||
) | ||
|
||
reviews_vector_chain = RetrievalQA.from_chain_type( | ||
llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0), | ||
chain_type="stuff", | ||
retriever=neo4j_vector_index.as_retriever(k=12), | ||
) | ||
reviews_vector_chain.combine_documents_chain.llm_chain.prompt = review_prompt |
7 changes: 7 additions & 0 deletions
7
langchain-rag-app/source_code_final/chatbot_api/src/entrypoint.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
# Run any setup steps or pre-processing tasks here | ||
echo "Starting hospital RAG FastAPI service..." | ||
|
||
# Start the main application | ||
uvicorn main:app --host 0.0.0.0 --port 8000 |
Oops, something went wrong.