Local chat and embedding, much more accurate RAG with reranking!
If you have any problems when using GGA release, please raise an issue.
In the open source community, we can find that the development of LLM and embedding models has made a great progress, with many excellent models emerging. For individuals, using these open source models can guarantee 100% security of private information and almost close to 90% experience without paying anything extra for it!
As a result, I spent some time researching and testing the usage and effectiveness of these models in order to be able to best adapt them to GGA, which now has integrated support for Ollama and some local embedding models for an optimal experience on the PC.
Ollama SUPPORT !!!
Ollama allows you to run open-source large language models, such as Llama 2, locally, it optimizes setup and configuration details, including CPU and GPU usage.
In order to use Ollama, Following these steps:
- Download and install Ollama onto the available supported platforms.
- Run Ollama, and open your CLI(command line) and use <
ollama pull qwen:7b-chat
> to download the model(best suit for PC both in English and Chinese). - Now you can run GGA, enjoy it!
The model
qwen:7b-Chat
optimally balances performance and speed, and it has 32k context length, making it ideal for RAG. Other chat model will be supported soon.
You can get more details in its repo.
The actual inference time depends on the model size and your hardware performance.
LOCAL EMBEDDING SUPPORT !!!
Embed some big files can cost a considerable mount of money, let's try Bge embedding models.You can find more info in its hugging face blog and Github repo.
In GGA, all you have to do is choose it, and GGA will do all jobs are needed to be done for you.
Rerank SUPPORT!!!
About rerank, you can check this blog. In short, Rerank can offering a straightforward, low-complexity method to refine search outcomes, enabling the integration of semantic relevance into current search systems without significant infrastructural modifications.
In GGA, bge-reranker is the default rerank model.
What's other Changed
- Openai v1.0 and Langchain v0.1 Migration Completed. by @Wannabeasmartguy in #4
- RAG Web Search and better display of knowledge base information by @Wannabeasmartguy in #5
Fix:
- Mixing of different dimensional embedding vectors triggered by switching knowledge bases.
- Typography is messed up when the browser zooms in and out
- Error display for some components
Add:
- RAG Web Search history can be saved locally, which is saved under
search_his
. - Re-ranking the similar search results will now be performed by default when doing RAG Q&A, which will significantly improve the accuracy of the search results.The Re-rank model defaults to using
Bge-reranker-large
, which is saved under '. /embedding model'. You can manually turn it off by checking the box inRAG Basic Setting
underRAG Basic Setting
in the right column.
Full Changelog: v0.10.1...v0.14.1.2