Skip to content

Commit

Permalink
feat: ✨ Add a text-based RAG chatbot
Browse files Browse the repository at this point in the history
  • Loading branch information
Tanish2207 committed Oct 30, 2024
1 parent a785cb0 commit a4df74a
Show file tree
Hide file tree
Showing 4 changed files with 294 additions and 0 deletions.
146 changes: 146 additions & 0 deletions Generative Models/RAG-Chatbot/ML_Nexus.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
Welcome to the ML-NEXUS Zone⚙️⏳

ML-Nexus
A dynamic hub of Machine Learning innovations, where hands-on projects and collaborative experiments come together to inspire open-source contributions and foster a community of shared learning.

This repository is a diverse collection of projects ranging from beginner-friendly models to advanced AI applications. Whether you're new to the field or a seasoned expert, there's something for everyone to contribute to. Dive into neural networks, computer vision, natural language processing (NLP), and more. Join our vibrant community, share your ideas, and help shape the future of AI—together!

NOTE: You're limited to earning a maximum of 200 points from this repo. Additionally, we can't accept any ideas or features if your score already exceeds 200 points.

Join official Discord Channel for discussion


Natural Language Processing (NLP)
Meshery - Service Mesh Management PlaneNatural Language Processing (NLP) Projects in this area involve working with text data, such as sentiment analysis, language translation, text summarization, and chatbot development using techniques like tokenization, word embeddings, and transformers.



Computer Vision
Meshery - Service Mesh Management PlaneComputer Vision Contributors can explore projects related to image classification, object detection, facial recognition, and image segmentation using tools like OpenCV, convolutional neural networks (CNNs), and transfer learning.



Neural Networks
Meshery - Service Mesh Management PlaneNeural Networks Neural networks power most deep learning models. Contributions could include creating models for image classification, regression tasks, sequence prediction, and generative models using frameworks like TensorFlow or PyTorch.



Generative Models
Meshery - Service Mesh Management PlaneGenerative Models This includes working on projects related to Generative Adversarial Networks (GANs) for image generation, text-to-image models, or style transfer, contributing to fields like art creation and synthetic data generation.



Time Series Analysis
Meshery - Service Mesh Management PlaneTime Series Analysis Contributors can work on analyzing temporal data, building models for stock price prediction, climate forecasting, or IoT sensor data analysis using LSTM or GRU networks.




Transfer Learning
Meshery - Service Mesh Management PlaneTransfer Learning Explore projects where pre-trained models are fine-tuned for specific tasks, such as custom object detection or domain-specific text classification, reducing the need for extensive training data.


📚 Machine Learning Resources
This project uses a number of key libraries to implement machine learning models and data processing pipelines. To help you better understand these libraries and their roles in the project, we've created a dedicated guide.

For an in-depth overview of the most important libraries used in this project, including their features and functionalities, check out the Machine Learning Libraries Overview.

This guide covers:

NumPy 🧮 for numerical computations.
Pandas 📊 for data manipulation.
TensorFlow 🤖 and PyTorch 🔥 for deep learning.
And more!
We encourage you to explore this document to gain a deeper understanding of the tools that power our machine learning workflows.

📚 Generative AI resources
To get in-depth overview and roadmap to learn Generative AI. Check out Generative AI Roadmap.

This guide covers:

Overview of generative AI
Roadmap to learn Generative AI
LLM models 🤖
Retrieval Augumented Generation (RAG)
Vector and graph databases
Embedding models
Inference APIs
PDF scrapping 🗒️
AI agents 🤖

📚 Deep Learning Roadmap
To get an in-depth overview and roadmap to learn Deep Learning, check out Deep Learning Roadmap.

This guide covers:

Overview of deep learning
Roadmap to learn deep learning
Types of neural networks 🧠
Key deep learning concepts
Regularization techniques 💡
Model optimization 🔧
Transfer learning 🚀
Deep learning applications 📷📝🔊
Best practices and resources


⭐ How to get started with open source?


You can refer to the following articles on the basics of Git and Github.

Watch this video to get started, if you have no clue about open source
Forking a Repo
Cloning a Repo
How to create a Pull Request
Getting started with Git and GitHub



💥 How to Contribute to ML-Nexus?
Take a look at the Existing Issues or create your own Issues!
Wait for the Issue to be assigned to you.
Fork the repository
click on the uppermost button

Fork the repository to your own GitHub account.

Clone the repository to your local machine:

git clone https://github.com/<your-username>/ML-Nexus.git
Navigate into the directory:

cd ML-Nexus
Install dependencies (if applicable):

npm install
Create a new branch for your changes:

git checkout -b <your-branch-name>
Make your changes, commit, and push:

git add .
git commit -m "Your message here"
git push origin <your-branch-name>
Submit a pull request:

Go to the original repository on GitHub.
Click on the "Pull Requests" tab.
Click the "New Pull Request" button.
Select your feature branch and submit the pull request.
Wait for review and feedback.

Address any comments or requested changes.
Once approved, your feature will be merged into the main branch.
Have a look at Contributing Guidelines
Read the Code of Conduct

❤️ Project Admin

👑 Admin
Kalyani

💻 Project Mentors

Sai Nivedh V 🔧 Mentor
Pratyay Banerjee 🔧 Mentor
Binary file added Generative Models/RAG-Chatbot/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
119 changes: 119 additions & 0 deletions Generative Models/RAG-Chatbot/rag_llama.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import ollama
import faiss
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings

modelPath = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": False}

embeddings = HuggingFaceEmbeddings(
model_name=modelPath, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)


emb = embeddings.embed_query("Hello World")
# print(len(emb))


with open("ML_Nexus.txt", "r", encoding="utf-8") as file:
document_text = file.read()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=90, chunk_overlap=30)
chunks = text_splitter.split_text(document_text)
# print(len(chunks))


from langchain_core.documents import Document

document_obj = []
for i, doc_content in enumerate(chunks, start=-1):
temp = Document(page_content=doc_content)
document_obj.append(temp)


from uuid import uuid4

uuids = [str(uuid4()) for _ in range(len(document_obj))]


# print(document_obj)
# print(len(uuids))


final_emb = []
for doc in document_obj:
embedding = embeddings.embed_query(doc.page_content)
final_emb.append(embedding)
final_emb = np.array(final_emb)
# print(final_emb.shape)


import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

# Create the Faiss index
# dimension = document_embeddings.shape[1]
# print(dimension)
emb = embeddings.embed_query("Hello World")
index = faiss.IndexFlatL2(len(emb)) # Using L2 distance for simplicity

index.add(final_emb)

index_to_docstore_id = {i: uuids[i] for i in range(len(uuids))}

# print(index.d)

vector_store = FAISS(
embedding_function=embeddings,
index=index,
docstore=InMemoryDocstore(),
index_to_docstore_id=index_to_docstore_id,
)

vector_store.add_documents(documents=document_obj, ids=uuids)


def get_context(query):
# similar_search_result = vector_store.similarity_search(query, k=5)
# print(similar_search_result)
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

similar_search_result = retriever.invoke(query)
# print(similar_search_result)
compiled_context = "\n\n".join(doc.page_content for doc in similar_search_result)
return compiled_context


def llm(question):
compiled_context = get_context(question)

formatted_prompt = """
"Answer the question below with the context.\n\n"
Context :\n\n{}\n\n----\n\n"
"Question: {}\n\n"
"Write an answer based on the context. "
"If the context provides insufficient information reply "
'"The information given in the context is insufficient. Thus, answering without context: "'
"and then answer the question with the existing knowledge you have"
"If quotes are present and relevant, use them in the answer."
""".format(
compiled_context, question
)
res = ollama.chat(
model="llama3.1:latest", messages=[{"role": "user", "content": formatted_prompt}]
)
print(res["message"]["content"])
print("-" * 100)
# return res["message"]["content"]


queries = [
"Who are the mentors of the given project?",
"What is the maximum number of points I can earn from this repository?"
]
for query in queries:
print(f"Question: {query}\n")
llm(query)
print()
29 changes: 29 additions & 0 deletions Generative Models/RAG-Chatbot/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# RAG Chatbot
Most large language models can only provide information based on the corpus of data that they’ve been trained on.
These models might hallucinate if they don't have the required data or context.
This is where RAG or Retrieval-Augmented-Generation helps.

By incorporating a retriever, RAG pulls relevant information from external knowledge sources, such as databases or documents, to enrich the generated output with up-to-date, contextually accurate information. This approach helps to mitigate the limitations of static model training data, enabling real-time responses that adapt to the specific needs of each query.

---

The following technologies are used in making of this RAG based chatbot:

- sentence-transformers/all-MiniLM-l6-v2 as sentence embedding model
- RecursiveCharacterTextSplitter for chunking text
- Llama3.1 as the LLM
- FAISS as Vector DB

---
### Outputs

I had made a sample text file of the README of the [ML-Nexus](https://github.com/UppuluriKalyani/ML-Nexus) repo.

* Questions I asked:
1. Who are the mentors of the given project?
2. What is the maximum number of points I can earn from this repository?

* Answers:


![](image.png)

0 comments on commit a4df74a

Please sign in to comment.