Skip to content

Releases: OpenBMB/llama.cpp

b3645

30 Aug 08:08
7ea8d80
Compare
Choose a tag to compare
llava : the function "clip" should be int (#9237)

b3615

22 Aug 10:22
1731d42
Compare
Choose a tag to compare
[SYCL] Add oneDNN primitive support (#9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

b3621

10 Aug 09:43
fc1c860
Compare
Choose a tag to compare
Merge branch 'prepare-PR-of-minicpm-v2.6' into master

b3209

24 Jun 04:04
95f57bb
Compare
Choose a tag to compare
ggml : remove ggml_task_type and GGML_PERF (#8017)

* ggml : remove ggml_task_type and GGML_PERF

* check abort_callback on main thread only

* vulkan : remove usage of ggml_compute_params

* remove LLAMA_PERF

b3078

04 Jun 07:37
bde7cd3
Compare
Choose a tag to compare
llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b3026

28 May 20:05
5442939
Compare
Choose a tag to compare
llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>
Co-authored-by: Steffen Roecker <[email protected]>

b3025

28 May 20:04
56411a9
Compare
Choose a tag to compare
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

b2979

23 May 11:53
9b82476
Compare
Choose a tag to compare
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-…