-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Support for Apple Silicon #1289
base: main
Are you sure you want to change the base?
Conversation
- No gguf support yet. - Build Triton and bitsandbytes from source - `cmake -DCOMPUTE_BACKEND=hip -S .` for bitsandbytes building
Is this working? |
Hi there thank you for this we will need a bit more time to review! :) |
Hi @shashikanth-a - thank you for this. Could you please provide information about the environment and package versions you used for development? |
Hey, does this works with newly released vision support? |
Currently I can run this if:
|
- lazy loading of model - minor refactoring - optimizers and lr schedulers - gc - should improve memory consumption
With the changes I can run this out of the box with the steps outlined above:
On a M4 Pro getting around 100 t/s for llama3-8b. Can confirm it will also now work with llama-3.2-3b |
Thanks a lot - would anyone be so kind to benchmark this against MLX itself and share results? Time it took, amount of VRAM, context length, if the losses match - ofcourse it's a lot so just time and checking to see if the losses match would be more than helpful. Thank you so much! :) |
Sorry for the delay. Unsloth run
MLX Run
I can already see the parameter need to be reviewed since the trainable percentage of the models is different. |
Was able to make this work! Thanks for this! But the unsloth-zoo==2014.11.4 did not work for me, some functions were missing. Was able to make it run with version 2014.11.6 |
Any plan to get this to merge soon? I really need this feature. |
Hey y'all thanks a lot for all the tests and thanks once again @shashikanth-a for the PR. We'll be doing a PR review and benchmarking tests hopefully next week! Thanks @mkemka as well for the test we appreciate it |
Cool stuff! Also, if you tried KoboldCPP (based on llama.cpp), there’s stuff like prompt caching. When I text the model, it will only process the new tokens because the stuff above it already pre-renders, so it goes fast. How can I get that with Unsloth? |
cmake -DCOMPUTE_BACKEND=mps -S .
for bitsandbytes building