-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to run Llama 7B float 16 not in my system or google colab #25
Comments
Hi, did you resolve the issue? |
Nope, I did't got any response, so I left the thread. But it is worth checking out again. |
I was able to run the minimum example with python 3.10.13 and NO CUDA: I'm using CPU for inference because my GPU has limited memory. So instead of installing ONNXRuntime with "pip install torch onnxruntime-gpu", I've installed it with "pip install torch onnxruntime". |
That's awesome, but the time gap is a large to asses the working of the runtime. But I will also check out on the same. |
Yes, it's slow. And I have to delete and recreate the python virtual environment several times. Initially I've intalled "onnxruntime-gpu", unistalled it and installed "onnxruntime" (CPU version), but I've got other errors and so I've deleted and recreated the virtual enviroment. |
wow, that's a lot of ifs and so, but yeah got it. But thanks for the workaround. |
Hello all, I actually came across the same problem but with the 7B_FT_float32 model. I have two GPUs that have 24 GB of GPU memory, but as far as I understand, to run the 7B_FT_float32 model, a minimum of 25 GB of GPU memory is needed. So, is there a way to run this on my device? Is it possible to run ONNXRuntime on multiple GPUs? |
yes, running on multiple GPU's would be very useful
|
I have been testing the repo inside my laptop and Google Colab. Here is the system information for both environments.
My local system:
Google colab
Command to reproduce
Output in my local system
Output in Google colab
This probably means the process is automatically getting killed.
So now I have two questions here:
DmlExecutionProvider
and giving error.52-60
seconds in google colab (after which it is using ^C to kill the process) and `10-15`` seconds in my local m(after which it is giving error)!! Update:
made some changes in the example code in just to provide the CPU Execution provider.
And then ran the same command, it took more than 2.5 minutes and finally the process got killed. It seems like I might not have the correct cuda vs onnx compatibility for which it could be generating error.
The text was updated successfully, but these errors were encountered: