Getting errors when trying to run llama.cpp example. #977

Brentably · 2024-11-18T04:03:31Z

charlesfrye · 2024-11-18T04:55:26Z

Thanks for reporting! I'm not able to reproduce this one. Did you make any changes to the code in the example?

charlesfrye · 2024-11-18T05:17:36Z

PS: I noticed that the model download in this one was pretty slow -- just CURLing Hugging Face. I switched it over to the faster hf_transfer library in #978, which dropped download times from several minutes to under 30 seconds.

That should make it easier to iterate on the Image definition. You can also modal shell in and poke around, running commands until the issue is fixed, then copy those commands over into the Image definition at the end.

mpr1255 · 2025-01-15T10:04:26Z

Same error when I tried to use it for the first time. Errored on Mac, switched to linux, same error. Full error message:

uvx modal run 06_gpu_and_ml/llm-serving/llama_cpp.py
✓ Initialized. View run at https://modal.com/apps/mpr1255/main/ap-B34Wncw2b37KobBEXJFjCg
✓ Created objects.
├── 🔨 Created mount /home/ubuntu/modal_test/modal-examples/06_gpu_and_ml/llm-serving/llama_cpp.py
├── 🔨 Created function download_model.
└── 🔨 Created function llama_cpp_inference.
/build/bin/llama-cli: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.13' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /build/bin/llama-cli)
/build/bin/llama-cli: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /build/bin/llama-cli)
Traceback (most recent call last):
  File "/pkg/modal/_runtime/container_io_manager.py", line 741, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 240, in run_input_sync
    res = io_context.call_finalized_function()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pkg/modal/_runtime/container_io_manager.py", line 180, in call_finalized_function
    res = self.finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama_cpp.py", line 80, in llama_cpp_inference
    subprocess.run(
  File "/usr/local/lib/python3.11/subprocess.py", line 569, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']' returned non-zero exit status 1.
Stopping app - uncaught exception raised locally: CalledProcessError(1, ['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']).
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ubuntu/modal_test/modal-examples/06_gpu_and_ml/llm-serving/llama_cpp.py:96 in main         │
│                                                                                                  │
│   95 def main(prompt: str = None, num_output_tokens: int = None):                                │
│ ❱ 96 │   llama_cpp_inference.remote(prompt, num_output_tokens)                                   │
│   97                                                                                             │
│                                                                                                  │
│               ...Remote call to Modal Function (ta-01JHMPKBM99D9GWFX1MKQ3Z8CB)...                │
│                                                                                                  │
│ /root/llama_cpp.py:80 in llama_cpp_inference                                                     │
│                                                                                                  │
│ ❱ 80 subprocess.run(                                                                             │
│                                                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.11/subprocess.py:569 in run                                               │
│                                                                                                  │
│ ❱ 569 raise CalledProcessError(retcode, process.args,                                            │
│                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/build/bin/llama-cli', '-m', '/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf', '-n', '128', '-p', 'Write a poem about New York City.\n']' returned non-zero exit status 1.

charlesfrye · 2025-01-28T22:56:09Z

We updated the llama.cpp example to run DeepSeek-R1 on GPU. There's also a (new) code path for running Phi-4 on CPU. If the same error recurs there, please re-open and I'll investigate!

charlesfrye closed this as completed Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting errors when trying to run llama.cpp example. #977

Getting errors when trying to run llama.cpp example. #977

Brentably commented Nov 18, 2024

charlesfrye commented Nov 18, 2024

charlesfrye commented Nov 18, 2024

mpr1255 commented Jan 15, 2025

charlesfrye commented Jan 28, 2025

Getting errors when trying to run llama.cpp example. #977

Getting errors when trying to run llama.cpp example. #977

Comments

Brentably commented Nov 18, 2024

charlesfrye commented Nov 18, 2024

charlesfrye commented Nov 18, 2024

mpr1255 commented Jan 15, 2025

charlesfrye commented Jan 28, 2025