This repository has been archived by the owner on Aug 13, 2024. It is now read-only.
Replies: 1 comment
-
Thanks. We have a few team members who are specialized in smaller, open source models. This is on the roadmap. Good resources! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I see that a similar discussion exists here #10
But considering that this suggests removing any uses of the OpenAI API with a local implementation using my reference model or any model that is better, rather than using different models for different layers or layer components as seen in the existing thread.
I feel it warrants a separate discussion.
When watching Dave's demo of the project, a big standout were his remarks of timing out the API when just running the demo briefly, and seeing the amount of inferences that will need to be generated.
I don't think this limitation is necessary, and depending on a third party is not ideal. The limitation should rather be the amount of compute available, and getting this to run on consumer hardware would be the best.
As such, I suggest using the dolphin-2.1-mistral-7b model.
Specifically a quantised version that can run with a maximum ram requirement of only 7.63 GB and a download size of only 5.13gb.
Using the llama-cpp-python bindings, which meets the project requirements of only being in python.
There are benefits to doing it this way:
Benefits to this model specifically
Benefits to using the llama-cpp-python bindings:
This is just a suggestion, and this model will become outdated within the week.
But I think that this is truly the right way to go.
Beta Was this translation helpful? Give feedback.
All reactions