Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting errors when trying to run llama.cpp example. #977

Closed
Brentably opened this issue Nov 18, 2024 · 4 comments
Closed

Getting errors when trying to run llama.cpp example. #977

Brentably opened this issue Nov 18, 2024 · 4 comments

Comments

@Brentably
Copy link

Screenshot 2024-11-17 at 20 03 21
@charlesfrye
Copy link
Collaborator

Thanks for reporting! I'm not able to reproduce this one. Did you make any changes to the code in the example?

@charlesfrye
Copy link
Collaborator

PS: I noticed that the model download in this one was pretty slow -- just CURLing Hugging Face. I switched it over to the faster hf_transfer library in #978, which dropped download times from several minutes to under 30 seconds.

That should make it easier to iterate on the Image definition. You can also modal shell in and poke around, running commands until the issue is fixed, then copy those commands over into the Image definition at the end.

@mpr1255
Copy link

mpr1255 commented Jan 15, 2025

Same error when I tried to use it for the first time. Errored on Mac, switched to linux, same error. Full error message:

uvx modal run 06_gpu_and_ml/llm-serving/llama_cpp.py
✓ Initialized. View run at https://modal.com/apps/mpr1255/main/ap-B34Wncw2b37KobBEXJFjCg
✓ Created objects.
├── 🔨 Created mount /home/ubuntu/modal_test/modal-examples/06_gpu_and_ml/llm-serving/llama_cpp.py
├── 🔨 Created function download_model.
└── 🔨 Created function llama_cpp_inference.
/build/bin/llama-cli: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.13' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /build/bin/llama-cli)
Traceback (most recent call last):
  File "/pkg/modal/_runtime/container_io_manager.py", line 741, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 240, in run_input_sync
    res = io_context.call_finalized_function()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pkg/modal/_runtime/container_io_manager.py", line 180, in call_finalized_function
    res = self.finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama_cpp.py", line 80, in llama_cpp_inference
    subprocess.run(
  File "/usr/local/lib/python3.11/subprocess.py", line 569, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']' returned non-zero exit status 1.
Stopping app - uncaught exception raised locally: CalledProcessError(1, ['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']).
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ubuntu/modal_test/modal-examples/06_gpu_and_ml/llm-serving/llama_cpp.py:96 in main         │
│                                                                                                  │
│   95 def main(prompt: str = None, num_output_tokens: int = None):                                │
│ ❱ 96 │   llama_cpp_inference.remote(prompt, num_output_tokens)                                   │
│   97                                                                                             │
│                                                                                                  │
│               ...Remote call to Modal Function (ta-01JHMPKBM99D9GWFX1MKQ3Z8CB)...                │
│                                                                                                  │
│ /root/llama_cpp.py:80 in llama_cpp_inference                                                     │
│                                                                                                  │
│ ❱ 80 subprocess.run(                                                                             │
│                                                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.11/subprocess.py:569 in run                                               │
│                                                                                                  │
│ ❱ 569 raise CalledProcessError(retcode, process.args,                                            │
│                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']' returned non-zero exit status 1.

@charlesfrye
Copy link
Collaborator

We updated the llama.cpp example to run DeepSeek-R1 on GPU. There's also a (new) code path for running Phi-4 on CPU. If the same error recurs there, please re-open and I'll investigate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants