Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions: tree sitter, git, ollama #60

Open
magaton opened this issue Mar 4, 2024 · 12 comments
Open

Questions: tree sitter, git, ollama #60

magaton opened this issue Mar 4, 2024 · 12 comments
Labels
good first issue Good for newcomers

Comments

@magaton
Copy link

magaton commented Mar 4, 2024

Hello, interesting project and architecture.
I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Thanks

@magaton
Copy link
Author

magaton commented Mar 6, 2024

Anyone?

@Umpire2018
Copy link
Collaborator

I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Hi there! Thank you for your suggestion. I will look into it. It would be great if you can share any details with me.

@Umpire2018
Copy link
Collaborator

image

Reference: here

We applied

  1. Abstract Syntax Tree (AST) to extract all Classes and Functions within the file, including their type, name, code snippets, etc which is similar with

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited).

image
image

  1. Jedi to find_all_referencer of single function in repo_agent/doc_meta_info.py Line 270 .

Seems like tree-sitter is better than ast because it provides multiple programming language support.
image

Correct me if i am wrong. @LOGIC-10

@magaton
Copy link
Author

magaton commented Mar 8, 2024

Thanks for the response guys. Again, excellent work. I am on the same boat and the support for the multiple programming languages is a stopper from using your project.

As I can see you do Python AST + Jedi for the function calls.
Replacing python AST with tree-sitter could bring you closer to multi-lnaguage support , but Jedi is usable only for python.

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph)
and a reference implementation called Joern:

Have you maybe considered that?

@Umpire2018
Copy link
Collaborator

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:

I wonder if CPG have a python implementation? https://github.com/markgacoka/codepropertygraph may not be a good choice.

And the goal is to replace AST + Jedi via one or multiple library in order to acheieve multi-language support.

@magaton
Copy link
Author

magaton commented Mar 11, 2024

I am using Joern for CPG -> Neo4j, but that is scala
There is also https://pypi.org/project/cpggen/ in python

@Umpire2018
Copy link
Collaborator

AppThreat/cpggen: This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

It seems that now is not a good time to introduce CPG but we will definitely consider tree sitter.

@magaton
Copy link
Author

magaton commented Mar 13, 2024

Understood, but when you use tree-sitter, maybe you can only take its CST output and use a code chunker from llama index
https://docs.sweep.dev/blogs/chunking-improvements

@Umpire2018
Copy link
Collaborator

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Seems like Ollama have provided openai-compatibility so i think support Ollama or others open source llm is not high priority.

Right now we only used Chat completions ablility.

Similar projects for reference are as follows:

  1. vllm
  2. llama-cpp-python
  3. Ollama

@Umpire2018 Umpire2018 added the good first issue Good for newcomers label Mar 22, 2024
@Major-wagh
Copy link

Hello, I too wanted support for languages other then python. Does anybody know the approach or neccsessary changes to be done to the existing code repository?

@biandan
Copy link

biandan commented Jul 14, 2024

openai很多地方无法使用,我也期待支持ollama

@sandeshchand
Copy link

sandeshchand commented Aug 7, 2024

Is there any method/approach for supporting multiple programming language to find_all_referencer of single function ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants