In today's information-rich world, efficiently organizing, understanding, and retrieving knowledge is crucial. This AI-Powered Knowledge Graph and Question Answering System addresses several key challenges:
-
Information Overload: With the vast amount of textual data available, it's challenging to quickly grasp the key concepts and relationships within a body of text. This system automatically extracts and visualizes this information as a knowledge graph.
-
Contextual Understanding: Traditional keyword-based search often lacks context. By representing information as a graph, this system captures the relationships between entities, enabling more nuanced and context-aware querying.
-
Flexible Knowledge Base: The system can continuously expand its knowledge graph with new information, making it adaptable to various domains and evolving knowledge landscapes.
-
Intelligent Question Answering: By combining the power of large language models with structured knowledge graphs, the system can provide more accurate and contextually relevant answers to user queries.
-
Bridging Structured and Unstructured Data: This system seamlessly integrates unstructured text data with structured graph representations, offering the benefits of both worlds.
-
Visual Insight: The graph visualization component allows users to intuitively explore complex relationships within the data, facilitating better understanding and decision-making.
-
Extensibility: With its modular architecture, the system can be easily extended to incorporate new data sources, language models, or specialized domain knowledge.
By addressing these needs, this system serves as a powerful tool for researchers, analysts, educators, and anyone dealing with large volumes of textual information, helping them to quickly extract insights, answer questions, and discover new connections within their data.
- Knowledge graph generation from text input
- Question answering based on the generated knowledge graph
- Web search integration for questions not answerable from the graph
- Graph visualization
- Flexible LLM integration (OpenAI and Amazon Bedrock supported)
- Neo4j graph database integration
- Python: The primary programming language used for the project.
- LangChain: For building and managing the AI/LLM pipelines.
- LangGraph: Used for creating the workflow graph.
- Neo4j: Graph database for storing and querying the knowledge graph.
- FastAPI: For creating the API endpoints.
- Streamlit: For building the user interface.
- OpenAI GPT: Large Language Model for text processing and generation.
- Amazon Bedrock: Alternative LLM provider.
- Tavily Search API: For web search functionality.
- NetworkX and Pyvis: For graph visualization.
- Docker: For containerization and easy deployment.
The project is organized into several key components:
- LLM Pipeline: Handles interactions with language models (OpenAI GPT or Amazon Bedrock).
- Graph Database: Manages the Neo4j database operations.
- Content Provider: Supplies text content for knowledge graph generation.
- Workflow Nodes: Individual processing steps in the application workflow.
- API and UI: FastAPI for backend services and Streamlit for the user interface.
The LLM pipeline (src/LLM/Pipeline.py
) is responsible for:
- Generating knowledge graph queries
- Routing user requests
- Answering questions based on graph content
- Generating responses for web search results
The graph database component (src/Databases/GraphDatabase.py
and src/Databases/Neo4j.py
) handles:
- Executing Cypher queries
- Fetching graph data
- Managing database connections
The main workflow (src/graph.py
) orchestrates the entire process, including:
- Routing user input
- Generating knowledge graphs
- Finding relevant nodes
- Answering questions or performing web searches
Here's a visual representation of the workflow:
The Streamlit-based UI (src/ui.py
) provides a user-friendly interface for interacting with the system.
The project uses YAML configuration files for both the LLM and database settings:
- LLM config:
configs/LLM/openai.yaml
orconfigs/LLM/bedrock.yaml
- Database config:
configs/GraphDatabase/neo4j.yaml
-
Make sure you have Docker and Docker Compose installed on your system.
-
Clone the repository and navigate to the project directory.
-
Create a
.env
file in the root directory and add your environment variables:OPENAI_API_KEY=your_openai_api_key TAVILY_API_KEY=your_tavily_api_key
-
Build and run the containers:
docker-compose up --build
-
Access the Streamlit UI at
http://localhost:8501
-
Install dependencies:
poetry install
-
Set up environment variables in a
.env
file. -
Run the Streamlit UI:
streamlit run src/ui.py
- Implement caching for improved performance
- Add support for more LLM providers
- Enhance the graph visualization capabilities
- Implement user authentication and multi-user support
- Monitoring and logging
The project requires certain environment variables to be set in a .env
file in the root directory. The required variables differ slightly depending on whether you're using OpenAI or AWS (Bedrock) as your LLM provider.
LLM_CONFIG_PATH=configs/LLM/openai.yaml
GRAPH_DATABASE_CONNECTION_PATH=configs/GraphDatabase/neo4j.yaml
OPENAI_API_KEY=your_openai_api_key
TAVILY_API_KEY=your_tavily_api_key
LLM_CONFIG_PATH=configs/LLM/bedrock.yaml
GRAPH_DATABASE_CONNECTION_PATH=configs/GraphDatabase/neo4j.yaml
AWS_ACCESS_KEY_ID=your_aws_access_key_id
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
AWS_DEFAULT_REGION=your_aws_region
TAVILY_API_KEY=your_tavily_api_key
Make sure to replace the placeholder values with your actual API keys and credentials. Keep your .env
file secure and never commit it to version control.
After setting up the project, you can interact with the system through the Streamlit UI or by using the API directly.
-
Start the Streamlit app:
streamlit run src/ui.py
-
Open your web browser and navigate to
http://localhost:8501
. -
You'll see a text input area where you can enter your text or question.
-
For generating a knowledge graph:
- Enter a paragraph or article of text.
- Click the "Run" button.
- The system will generate a knowledge graph and display it visually.
-
For asking questions:
- Enter your question in the text area.
- Click the "Run" button.
- The system will attempt to answer based on the existing knowledge graph.
- If the answer isn't found in the graph, it will perform a web search.
-
Generate a knowledge graph:
- Input: "John is a student at MIT. He is studying computer science and is passionate about artificial intelligence."
- The system will create nodes for John, MIT, computer science, and artificial intelligence, with appropriate relationships.
-
Ask a question:
- Input: "What is John studying?"
- The system will query the knowledge graph and respond with "John is studying computer science."
-
Ask a question not in the graph:
- Input: "What is the capital of France?"
- The system will perform a web search and provide an answer based on the search results.
This project is licensed under the MIT License. See the LICENSE file for details.
We welcome contributions to improve this project! Here's how you can contribute:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them with clear, descriptive messages.
- Push your changes to your fork.
- Submit a pull request to the main repository.
Please ensure your code adheres to the project's coding standards and include tests for new features or bug fixes.
If you encounter any bugs or have suggestions for improvements, please open an issue on the GitHub repository. Provide as much detail as possible, including steps to reproduce the issue and your environment details.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Thank you for your interest in contributing to our AI-Powered Knowledge Graph and Question Answering System!
This image shows the process of generating a knowledge graph from input text.
The system includes a monitoring utility that leverages Langfuse to track and analyze the performance of the knowledge graph generation and question answering processes. This utility provides real-time insights into system performance, allowing for quick identification and resolution of any issues that may arise.