Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review - Improve changes and move to rag_agent #4

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

chimosky
Copy link
Member

@chimosky chimosky commented Oct 5, 2024

Move RAG logic to a new file, use a class
for the functionalities.

Combine requirements.txt, no need for two separate files.

TODO

  • Migrate from RetrivevalQA as it's deprecated.
  • Make sure langchain tools used are up to date.
  • Create docs folder and add docs for RAG there.
  • self.prompt seems to be used just once, why? I'm guessing migrating from RetrievalQA might solve this.
  • Redirect output to right sources instead of using print statements.

@kshitijdshah99 @XXXJumpingFrogXXX kindly review.
I'll leave the other changes to you Kshitij.

This PR is open and we'll use it to track the TODO above, and will merge once you've made the necessary changes.

Move RAG logic to a new file, use a class
for the functionalities.

Combine requirements.txt, no need for two separate files.

Signed-off-by: Chihurumnaya Ibiam <[email protected]>
Signed-off-by: Chihurumnaya Ibiam <[email protected]>
@kshitijdshah99
Copy link
Contributor

kshitijdshah99 commented Oct 6, 2024

@chimosky I'm trying to manage the dependencies in my Conda environment for my AI model, but I'm facing some challenges. When I run the pip freeze command, I see a long list of installed packages, many of which seem interconnected. This makes it difficult for me to determine which dependencies are actually required for my project.

Additionally, when I installed the packages listed in my requirements.txt, many other dependencies were automatically downloaded. This is further complicating my ability to classify and understand which packages I really need.

Dependencies in my conda environment

aiohappyeyeballs==2.4.3
aiohttp==3.10.9
aiosignal==1.3.1
altair==5.4.1
annotated-types==0.7.0
anyio==4.6.0
attrs==24.2.0
blinker==1.8.2
cachetools==5.5.0
certifi==2024.8.30
charset-normalizer==3.4.0
click==8.1.7
dataclasses-json==0.6.7
faiss-cpu==1.9.0
filelock==3.16.1
frozenlist==1.4.1
fsspec==2024.9.0
gitdb==4.0.11
GitPython==3.1.43
greenlet==3.1.1
h11==0.14.0
httpcore==1.0.6
httpx==0.27.2
huggingface-hub==0.25.2
idna==3.10
Jinja2==3.1.4
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
langchain==0.3.3
langchain-community==0.3.2
langchain-core==0.3.10
langchain-experimental==0.3.2
langchain-huggingface==0.1.0
langchain-ollama==0.2.0
langchain-text-splitters==0.3.0
langsmith==0.1.132
markdown-it-py==3.0.0
MarkupSafe==3.0.1
marshmallow==3.22.0
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
narwhals==1.9.2
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.77
nvidia-nvtx-cu12==12.1.105
ollama==0.3.3
orjson==3.10.7
packaging==24.1
pandas==2.2.3
pillow==10.4.0
propcache==0.2.0
protobuf==5.28.2
pyarrow==17.0.0
pydantic==2.9.2
pydantic-settings==2.5.2
pydantic_core==2.23.4
pydeck==0.9.1
Pygments==2.18.0
PyMuPDF==1.24.11
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
referencing==0.35.1
regex==2024.9.11
requests==2.32.3
requests-toolbelt==1.0.0
rich==13.9.2
rpds-py==0.20.0
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
sentence-transformers==3.1.1
setuptools==75.1.0
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
SQLAlchemy==2.0.35
streamlit==1.39.0
sympy==1.13.3
tenacity==8.5.0
threadpoolctl==3.5.0
tokenizers==0.20.0
toml==0.10.2
torch==2.4.1
tornado==6.4.1
tqdm==4.66.5
transformers==4.45.2
triton==3.0.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
watchdog==5.0.3
wheel==0.44.0
yarl==1.14.0

@kshitijdshah99
Copy link
Contributor

@chimosky @walterbender I am almost done with other changes you asked me in the TODO.
I have modified the RAG a little bit and enhanced it such that it answer's only coding queries of kids and no other queries.

I have implemented it currently using this where it checks for some keywords in this list and based on that it determines if kids query if coding based or redundant. This is just a basic layout I am planning to scale it even further but later after integration is done. If we want we can ignore this part too, let me know your opinions?

coding_keywords = [
    'code', 'Python', 'bug', 'error', 'exception', 'syntax', 'Pygame', 'debug', 
    'program', 'programming', 'variable', 'loop', 'if', 'function', 'print', 
    'input', 'string', 'number', 'list', 'tuple', 'dictionary', 'module', 
    'import', 'turtle', 'indentation', 'syntax', 'Pygame', 'math', 'shape', 
    'for', 'while', 'repeat', 'class', 'object', 'method', 'attribute', 
    'operator', 'condition', 'true', 'false', 'boolean', 'integer', 'float', 
    'error', 'bug', 'debug', 'fix', 'type', 'index', 'range', 'len', 'append', 
    'pop', 'remove', 'sort', 'reverse', 'random', 'sleep', 'time', 'pygame', 
    'turtle', 'function', 'variable', 'list', 'loop', 'error', 'syntax', 
    'module', 'import', 'return', 'break', 'continue', 'GTK', 'logic', 'numbers'
]

@walterbender
Copy link
Member

How do you use the keyword list? I presume it will be extended?

@kshitijdshah99
Copy link
Contributor

It checks whether child's query contains words present in the list if yes then it answers them. Yes I think it will be extended, because for even advanced tools like chatGPT I tried opening a Python assistant and asked some non coding questions they too answered them, hence I feel advocating this method would be good in our case(as we are dealing with kids and we can't provide them redundant answers).

@chimosky
Copy link
Member Author

chimosky commented Oct 6, 2024

Answering only coding queries doesn't make much sense to me and shouldn't be something we aim for.

Like you mentioned above, what happens when there are non code related queries if we restrict the responses to just code based queries?

Also, it seems you've not reviewed my changes or working based on my branch, because if you did you'd notice the Pippy directory no longer exists so you wouldn't be making changes there.
The todo also said to create a docs directory and put the docs there.

Please review my changes.

Also base your changes on this branch, so it'll be easier to just merge here rather than cherry-pick and push here.

Signed-off-by: Chihurumnaya Ibiam <[email protected]>
@kshitijdshah99
Copy link
Contributor

Yeah I skipped the changes you made on your branch I misunderstood for the main. I feel we can make this chatbot generic too I have no issue with that but the purpose why it's placed inside Pippy is that it generates code itself and not deviate from it's purpose, by code I mean that it will also provide explanation, examples and relevant code snippet. @walterbender what's your opinion on this?
I'll make the other changes soon as you directed.

@chimosky
Copy link
Member Author

chimosky commented Oct 8, 2024

"...but the purpose why it's placed inside Pippy is that it generates code itself and not deviate from it's purpose, by code I mean that it will also provide explanation, examples and relevant code snippet..."

I don't understand what you mean by this, could you explain better?

@kshitijdshah99
Copy link
Contributor

"...but the purpose why it's placed inside Pippy is that it generates code itself and not deviate from it's purpose, by code I mean that it will also provide explanation, examples and relevant code snippet..."

I don't understand what you mean by this, could you explain better?

I mean that the way we have an AI-assistant in chat Activity for chatting purpose similarly this bot would be exclusively restricted to coding and python learning purpose. Bcz making it generic would loose it's value as it's placed in Pippy itself so the domain it's operating in must be specific.

@chimosky
Copy link
Member Author

chimosky commented Oct 10, 2024

I mean that the way we have an AI-assistant in chat Activity for chatting purpose similarly this bot would be exclusively restricted to coding and python learning purpose. Bcz making it generic would loose it's value as it's placed in Pippy itself so the domain it's operating in must be specific.

I got that part, doesn't make it a great reason to place it inside a Pippy directory, the rag_agent file does the job perfectly. We can keep any functionality this bot and the chatbot shares together and any other thing separate.

@kshitijdshah99
Copy link
Contributor

kshitijdshah99 commented Oct 11, 2024

@chimosky I have looked out for the dependencies of langchain they are quite updated not latest because on updating some it's giving conflicts with others, currently it's working great with python env 3.12 . I have made fixes to #6 kindly review it.

langchain==0.3.3
langchain-community==0.3.2
langchain-core==0.3.10
langchain-experimental==0.3.2
langchain-huggingface==0.1.0
langchain-ollama==0.2.0
langchain-text-splitters==0.3.0
langsmith==0.1.132

@chimosky
Copy link
Member Author

Reviewed, opened a PR that needs your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants